PinnedLoading
- llm-compressor
llm-compressor PublicTransformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
- speculators
speculators PublicA unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
- semantic-router
semantic-router PublicSystem Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
Repositories
Showing 10 of 33 repositories
- flash-attention Public Forked fromDao-AILab/flash-attention
Fast and memory-efficient exact attention
vllm-project/flash-attention’s past year of commit activity - semantic-router Public
System Level Intelligent Router for Mixture-of-Models at Cloud, Data Center and Edge
vllm-project/semantic-router’s past year of commit activity