DefTruth DefTruth

🎯

#pragma unroll

📚CUDA | LLM | VLM | Diffusion | AI Infra

Achievements

DefTruth/README.md

lite.ai.toolkitlite.ai.toolkitPublic
🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉
C++ 4k 736
vllm-project/vllmvllm-project/vllmPublic
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 42.3k 6.4k
Awesome-LLM-InferenceAwesome-LLM-InferencePublic
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, MLA, Parallelism, Prefix-Cache, Chunked-Prefill, etc. 🎉🎉
3.7k 260
PaddlePaddle/FastDeployPaddlePaddle/FastDeployPublic
⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…
C++ 3.1k 475
CUDA-Learn-NotesCUDA-Learn-NotesPublic
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
Cuda 2.9k 307
ffpa-attn-mmaffpa-attn-mmaPublic
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
Cuda 148 6