- vipshop.com
- Guangzhou, China
- 04:06
(UTC +08:00) - https://github.com/DefTruth
- https://www.zhihu.com/people/qyjdef
PinnedLoading
- lite.ai.toolkit
lite.ai.toolkit Public🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉
- vllm-project/vllm
vllm-project/vllm PublicA high-throughput and memory-efficient inference and serving engine for LLMs
- Awesome-LLM-Inference
Awesome-LLM-Inference Public📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, MLA, Parallelism, Prefix-Cache, Chunked-Prefill, etc. 🎉🎉
- PaddlePaddle/FastDeploy
PaddlePaddle/FastDeploy Public⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…
- CUDA-Learn-Notes
CUDA-Learn-Notes Public📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
- ffpa-attn-mma
ffpa-attn-mma Public📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
If the problem persists, check theGitHub status page orcontact support.