Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
@DefTruth
DefTruth
Follow
View DefTruth's full-sized avatar
🎯
#pragma unroll

DefTruth DefTruth

🎯
#pragma unroll
📚CUDA | LLM | VLM | Diffusion | AI Infra

Block or report DefTruth

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more aboutblocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more aboutreporting abuse.

Report abuse
DefTruth/README.md

logo

PinnedLoading

  1. lite.ai.toolkitlite.ai.toolkitPublic

    🛠 A lite C++ toolkit of 100+ Awesome AI models, support ORT, MNN, NCNN, TNN and TensorRT. 🎉🎉

    C++ 4k 736

  2. vllm-project/vllmvllm-project/vllmPublic

    A high-throughput and memory-efficient inference and serving engine for LLMs

    Python 42.3k 6.4k

  3. Awesome-LLM-InferenceAwesome-LLM-InferencePublic

    📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, Flash-Attention, Paged-Attention, MLA, Parallelism, Prefix-Cache, Chunked-Prefill, etc. 🎉🎉

    3.7k 260

  4. PaddlePaddle/FastDeployPaddlePaddle/FastDeployPublic

    ⚡️An Easy-to-use and Fast Deep Learning Model Deployment Toolkit for ☁️Cloud 📱Mobile and 📹Edge. Including Image, Video, Text and Audio 20+ main stream scenarios and 150+ SOTA models with end-to-end…

    C++ 3.1k 475

  5. CUDA-Learn-NotesCUDA-Learn-NotesPublic

    📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

    Cuda 2.9k 307

  6. ffpa-attn-mmaffpa-attn-mmaPublic

    📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

    Cuda 148 6


[8]ページ先頭

©2009-2025 Movatter.jp