- Tsinghua University
- Beijing, China
PinnedLoading
- thu-ml/SageAttention
thu-ml/SageAttention PublicQuantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
- thu-ml/SpargeAttn
thu-ml/SpargeAttn PublicSpargeAttention: A training-free sparse attention that can accelerate any model inference.
- SPH_Project
SPH_Project PublicSPH Realization of Fluid Simulation. Featuring Large Scale Simulation, Rigid-Fluid Coupling and High Viscosity Fluid.
- mit-han-lab/llm-awq
mit-han-lab/llm-awq Public[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
- thu-nics/MoA
thu-nics/MoA Public[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
- mit-han-lab/omniserve
mit-han-lab/omniserve Public[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
If the problem persists, check theGitHub status page orcontact support.
Uh oh!
There was an error while loading.Please reload this page.