xlite-dev
🛠 Repositories: lite.ai.toolkit | 📚Awesome-LLM-Inference | 📚LeetCUDA 🎧
🤖 ffpa-attn | 📈HGEMM | 🤗flux-faster | 📚Awesome-DiT-Inference 🖱
⚙️ RVM-Inference | lihang-notes(📚PDF, 200 Pages) | 💎torchlm 🔥
🤖 Contact: qyjdef@163.com | GitHub: DefTruth | 知乎: DefTruth 📞
PinnedLoading
- lite.ai.toolkit
lite.ai.toolkit Public🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
- Awesome-LLM-Inference
Awesome-LLM-Inference Public📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
- Awesome-DiT-Inference
Awesome-DiT-Inference Public📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
Repositories
- ffpa-attn Public
🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.
Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/ffpa-attn’s past year of commit activity - LeetCUDA Public
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/LeetCUDA’s past year of commit activity - diffusers Public Forked fromhuggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/diffusers’s past year of commit activity - cutlass Public Forked fromNVIDIA/cutlass
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/cutlass’s past year of commit activity Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/ComfyUI-CacheDiT’s past year of commit activity - nunchaku Public Forked fromnunchaku-ai/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/nunchaku’s past year of commit activity - sglang Public Forked fromsgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/sglang’s past year of commit activity - SageAttention Public Forked fromthu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
xlite-dev/SageAttention’s past year of commit activity - cache-dit Public Forked fromvipshop/cache-dit
A Unified and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for 🤗DiTs.
Uh oh!
There was an error while loading.Please reload this page.
xlite-dev/cache-dit’s past year of commit activity
Top languages
Loading…
Uh oh!
There was an error while loading.Please reload this page.
