quantization

Star

Here are 999 public repositories matching this topic...

Language:All

Filter by language

All999 Python493 Jupyter Notebook224 C++44 MATLAB30 C27 JavaScript18 Java16 Rust15 TypeScript9 Go7

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

hiyouga /LLaMA-Factory

Star64.1k

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

nlp agent ai transformers moe llama gpt lora quantization gemma fine-tuning peft large-language-models llm rlhf instruction-tuning qlora qwen deepseek llama3

UpdatedDec 16, 2025
Python

SYSTRAN /faster-whisper

Star19.5k

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

UpdatedNov 19, 2025
Python

ymcui /Chinese-LLaMA-Alpaca

Star19k

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

nlp llama lora quantization alpaca plm pre-trained-language-models large-language-models llm llama-2 alpaca-2

UpdatedJul 15, 2025
Python

UFund-Me /Qbot

Star15.5k

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs:https://ufund-me.github.io/Qbot ✨ :news: qbot-mini:https://github.com/Charmve/iQuant

machine-learning deep-learning bitcoin blockchain fintech quantitative-finance trademarks quantization funds strategies backtest quantitative-trading pytrade qlib quant-trade trade-bot quant-trader

UpdatedJul 6, 2025
Jupyter Notebook

bitsandbytes-foundation /bitsandbytes

Sponsor

Star7.8k

Accessible large language models via k-bit quantization for PyTorch.

machine-learning pytorch quantization llm qlora

UpdatedDec 12, 2025
Python

kornelski /pngquant

Star5.5k

Lossy PNG compressor — pngquant command based on libimagequant library

c palette quality png png-compression conversion smaller stdin image-optimization quantization pngquant

UpdatedJul 7, 2025
C

AutoGPTQ /AutoGPTQ

Star5k

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

nlp deep-learning transformers inference pytorch transformer quantization large-language-models llms

UpdatedApr 11, 2025
Python

OpenNMT /CTranslate2

Star4.2k

Fast inference engine for Transformer models

deep-neural-networks deep-learning cpp neon machine-translation openmp parallel-computing cuda inference avx intrinsics avx2 neural-machine-translation opennmt quantization gemm mkl thrust transformer-models onednn

UpdatedDec 5, 2025
C++

nunchaku-tech /nunchaku

Star3.5k

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

flux lora quantization iclr diffusion-models mlsys comfyui genai iclr2025

UpdatedNov 17, 2025
Python

huggingface /optimum

Star3.2k

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

UpdatedDec 17, 2025
Python

neuralmagic /deepsparse

Star3.2k

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

UpdatedJun 2, 2025
Python

huawei-noah /Pretrained-Language-Model

Star3.2k

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

pretrained-models quantization knowledge-distillation model-compression large-scale-distributed

UpdatedJan 22, 2024
Python

IntelLabs /nlp-architect

Star2.9k

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

nlp deep-learning tensorflow nlu transformers pytorch deeplearning quantization bert dynet

UpdatedNov 7, 2022
Python

thu-ml /SageAttention

Star2.9k

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

cuda triton attention vit quantization video-generation mlsys inference-acceleration efficient-attention llm llm-infra video-generate

UpdatedDec 11, 2025
Cuda

aaron-xichen /pytorch-playground

Star2.7k

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

pytorch quantization pytorch-tutorial pytorch-tutorials

UpdatedNov 22, 2022
Python

Build, personalize and control your own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community:https://discord.gg/TgHXuSJEk6

adapter deep-learning llama lora quantization language-model mistral fine-tuning peft finetuning mixed-precision gpt-2 gpt-j llm generative-ai gen-ai

UpdatedDec 2, 2025
Python

nunchaku-tech /ComfyUI-nunchaku

Star2.6k

ComfyUI Plugin of Nunchaku

flux quantization diffusion mlsys comfyui genai

UpdatedNov 8, 2025
Python

pytorch /ao

Star2.6k

PyTorch native quantization and sparsity for training and inference

training sparsity cuda inference optimizer pytorch transformer offloading llama quantization mx brrr dtypes float8

UpdatedDec 17, 2025
Python

intel /neural-compressor

Star2.5k

SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

UpdatedDec 17, 2025
Python

quic /aimet

Star2.5k

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

UpdatedDec 17, 2025
Python

Improve this page

Add a description, image, and links to thequantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thequantization topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization

Here are 999 public repositories matching this topic...

hiyouga /LLaMA-Factory

SYSTRAN /faster-whisper

ymcui /Chinese-LLaMA-Alpaca

UFund-Me /Qbot

bitsandbytes-foundation /bitsandbytes

kornelski /pngquant

AutoGPTQ /AutoGPTQ

OpenNMT /CTranslate2

nunchaku-tech /nunchaku

huggingface /optimum

neuralmagic /deepsparse

huawei-noah /Pretrained-Language-Model

IntelLabs /nlp-architect

thu-ml /SageAttention

aaron-xichen /pytorch-playground

stochasticai /xTuring

nunchaku-tech /ComfyUI-nunchaku

pytorch /ao

intel /neural-compressor

quic /aimet

Improve this page

Add this topic to your repo