Neural Magic

Neural Magic empowers developers to optimize and deploy LLMs at scale. Our model compression and acceleration enable top performance with vLLM.

PinnedLoading

nm-vllm-certsnm-vllm-certsPublic
General Information, model certifications, and benchmarks for nm-vllm enterprise distributions
11 1
deepsparsedeepsparsePublic
Sparsity-aware deep learning inference runtime for CPUs
Python 3.1k 181
sparsemlsparsemlPublic
Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models
Python 2.1k 151
docsdocsPublic
Top-level directory for documentation and general content
MDX 120 7
sparsezoosparsezooPublic
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
Python 382 26
guidellmguidellmPublic
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Python 228 23

Repositories

Showing 10 of 62 repositories

vllm Public Forked fromvllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
neuralmagic/vllm’s past year of commit activity
Python 10Apache-2.0 6,512 0 17 UpdatedMar 24, 2025
compressed-tensors Public
A safetensors extension to efficiently store sparse quantized tensors on disk
neuralmagic/compressed-tensors’s past year of commit activity
Python 91Apache-2.0 10 6 12 UpdatedMar 24, 2025
upstream-transformers Public Forked fromhuggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
neuralmagic/upstream-transformers’s past year of commit activity
Python 1Apache-2.0 28,835 0 0 UpdatedMar 23, 2025
guidellm Public
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
neuralmagic/guidellm’s past year of commit activity
Python 228Apache-2.0 23 24 9 UpdatedMar 22, 2025
vllm-flash-attention Public Forked fromvllm-project/flash-attention
Fast and memory-efficient exact attention
neuralmagic/vllm-flash-attention’s past year of commit activity
Python 2BSD-3-Clause 1,567 0 0 UpdatedMar 22, 2025
yolov5 Public Forked fromultralytics/yolov5
YOLOv5 in PyTorch > ONNX > CoreML > TFLite
neuralmagic/yolov5’s past year of commit activity
Python 20GPL-3.0 16,976 0 4 UpdatedMar 22, 2025
lm-evaluation-harness Public Forked fromEleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
neuralmagic/lm-evaluation-harness’s past year of commit activity
Python 3MIT 2,251 0 1 UpdatedMar 21, 2025
nm-actions Public
Neural Magic GHA
neuralmagic/nm-actions’s past year of commit activity
Python0Apache-2.00 0 2 UpdatedMar 20, 2025
mistral-evals Public Forked frommistralai/mistral-evals
neuralmagic/mistral-evals’s past year of commit activity
Python0 8 0 1 UpdatedMar 20, 2025
pytest-nm-releng Public
Pytest plugin used by the Release Engineering team
neuralmagic/pytest-nm-releng’s past year of commit activity
Python0Apache-2.00 0 0 UpdatedMar 19, 2025