Neural Magic
Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.
PinnedLoading
- nm-vllm-certs
nm-vllm-certs PublicGeneral Information, model certifications, and benchmarks for nm-vllm enterprise distributions
Repositories
Showing 10 of 62 repositories
- compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
neuralmagic/compressed-tensors’s past year of commit activity - upstream-transformers Public Forked fromhuggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
neuralmagic/upstream-transformers’s past year of commit activity - vllm-flash-attention Public Forked fromvllm-project/flash-attention
Fast and memory-efficient exact attention
neuralmagic/vllm-flash-attention’s past year of commit activity - lm-evaluation-harness Public Forked fromEleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/lm-evaluation-harness’s past year of commit activity
Top languages
Loading…