Movatterモバイル変換

PinnedLoading

LeetCUDA

Public

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9.6k 948

lite.ai.toolkit

Public

🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉

C++ 4.4k 773

Awesome-LLM-Inference

Public

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

Python 5k 338

Awesome-DiT-Inference

Public

📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉

Python 518 25

torchlm

Public

💎An easy-to-use PyTorch library for face landmarks detection: training, evaluation, inference, and 100+ data augmentations.🎉

Python 267 27

ffpa-attn

Public

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

Cuda 250 13

Showing 10 of 55 repositories

quack Public Forked fromDao-AILab/quack
A Quirky Assortment of CuTe Kernels
xlite-dev/quack’s past year of commit activity
Python0Apache-2.0 79 0 0 UpdatedFeb 5, 2026
ffpa-attn Public
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
xlite-dev/ffpa-attn’s past year of commit activity
Cuda 250GPL-3.0 13 1 0 UpdatedFeb 5, 2026
LeetCUDA Public
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
xlite-dev/LeetCUDA’s past year of commit activity
Cuda 9,614GPL-3.0 948 1 0 UpdatedFeb 5, 2026
diffusers Public Forked fromhuggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
xlite-dev/diffusers’s past year of commit activity
Python0Apache-2.0 6,821 0 0 UpdatedFeb 4, 2026
cutlass Public Forked fromNVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
xlite-dev/cutlass’s past year of commit activity
C++0 1,676 0 0 UpdatedFeb 4, 2026
ComfyUI-CacheDiT Public Forked fromJasonzzt/ComfyUI-CacheDiT
Cache-DiT Node for Comfyui
xlite-dev/ComfyUI-CacheDiT’s past year of commit activity
Python 1Apache-2.0 9 0 0 UpdatedFeb 3, 2026
nunchaku Public Forked fromnunchaku-ai/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
xlite-dev/nunchaku’s past year of commit activity
Python 2Apache-2.0 220 0 0 UpdatedFeb 2, 2026
sglang Public Forked fromsgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
xlite-dev/sglang’s past year of commit activity
Python0Apache-2.0 4,369 0 0 UpdatedJan 30, 2026
SageAttention Public Forked fromthu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
xlite-dev/SageAttention’s past year of commit activity
Cuda0Apache-2.0 338 0 0 UpdatedJan 22, 2026
cache-dit Public Forked fromvipshop/cache-dit
A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs.
xlite-dev/cache-dit’s past year of commit activity
Python 4Apache-2.0 60 0 0 UpdatedJan 21, 2026

View all repositories

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xlite-dev

PinnedLoading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics