Jason Huang jason-huang03

I am an undergraduate student from IIIS (Yao Class), Tsinghua University. I am currently interested in generative models and machine learning system.

Achievements

thu-ml/SageAttentionthu-ml/SageAttentionPublic
Quantized Attention achieves speedup of 2-5x and 3-11x compared to FlashAttention and xformers, without lossing end-to-end metrics across language, image, and video models.
Cuda 2k 152
thu-ml/SpargeAttnthu-ml/SpargeAttnPublic
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Cuda 649 48
SPH_ProjectSPH_ProjectPublic
SPH Realization of Fluid Simulation. Featuring Large Scale Simulation, Rigid-Fluid Coupling and High Viscosity Fluid.
Python 178 13
mit-han-lab/llm-awqmit-han-lab/llm-awqPublic
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
Python 3.1k 264
thu-nics/MoAthu-nics/MoAPublic
[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
Python 139 7
mit-han-lab/omniservemit-han-lab/omniservePublic
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
C++ 717 48