quantization

Star

Here are 773 public repositories matching this topic...

Language:All

Filter by language

All773 Python379 Jupyter Notebook168 C++37 MATLAB30 C22 JavaScript15 Java14 Rust14 Go6 TeX6

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

hiyouga /LLaMA-Factory

Star48.6k

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

agent ai transformers moe llama gpt lora quantization language-model mistral fine-tuning peft large-language-models llm rlhf instruction-tuning chatglm qlora qwen llama3

UpdatedMay 9, 2025
Python

ymcui /Chinese-LLaMA-Alpaca

Star18.8k

中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)

nlp llama lora quantization alpaca plm pre-trained-language-models large-language-models llm llama-2 alpaca-2

UpdatedApr 30, 2024
Python

SYSTRAN /faster-whisper

Star15.9k

Faster Whisper transcription with CTranslate2

deep-learning inference transformer speech-recognition openai speech-to-text quantization whisper

UpdatedApr 29, 2025
Python

UFund-Me /Qbot

Star11.3k

[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs:https://ufund-me.github.io/Qbot ✨ :news: qbot-mini:https://github.com/Charmve/iQuant

machine-learning deep-learning bitcoin blockchain fintech quantitative-finance trademarks quantization funds strategies quantitative-trading pytrade qlib quant-trade trade-bot quant-trader

UpdatedMay 5, 2025
Jupyter Notebook

bitsandbytes-foundation /bitsandbytes

Star7k

Accessible large language models via k-bit quantization for PyTorch.

machine-learning pytorch quantization llm qlora

UpdatedMay 9, 2025
Python

kornelski /pngquant

Star5.4k

Lossy PNG compressor — pngquant command based on libimagequant library

c palette quality png png-compression conversion smaller stdin image-optimization quantization pngquant

UpdatedJan 23, 2025
C

AutoGPTQ /AutoGPTQ

Star4.8k

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

nlp deep-learning transformers inference pytorch transformer quantization large-language-models llms

UpdatedApr 11, 2025
Python

IntelLabs /distiller

Star4.4k

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research.https://intellabs.github.io/distiller

deep-neural-networks jupyter-notebook pytorch regularization pruning quantization group-lasso distillation onnx truncated-svd network-compression pruning-structures early-exit automl-for-compression

UpdatedApr 24, 2023
Jupyter Notebook

OpenNMT /CTranslate2

Star3.8k

Fast inference engine for Transformer models

deep-neural-networks deep-learning cpp neon machine-translation openmp parallel-computing cuda inference avx intrinsics avx2 neural-machine-translation opennmt quantization gemm mkl thrust transformer-models onednn

UpdatedApr 8, 2025
C++

neuralmagic /deepsparse

Star3.1k

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

UpdatedMay 5, 2025
Python

huawei-noah /Pretrained-Language-Model

Star3.1k

Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

pretrained-models quantization knowledge-distillation model-compression large-scale-distributed

UpdatedJan 22, 2024
Python

IntelLabs /nlp-architect

Star2.9k

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

nlp deep-learning tensorflow nlu transformers pytorch deeplearning quantization bert dynet

UpdatedNov 7, 2022
Python

huggingface /optimum

Star2.9k

🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools

training optimization intel transformers inference pytorch quantization onnx tflite onnxruntime graphcore habana

UpdatedMay 9, 2025
Python

aaron-xichen /pytorch-playground

Star2.7k

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

pytorch quantization pytorch-tutorial pytorch-tutorials

UpdatedNov 22, 2022
Python

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community:https://discord.gg/TgHXuSJEk6

adapter deep-learning llama lora quantization language-model alpaca mistral fine-tuning peft finetuning mixed-precision gpt-2 gpt-j llm generative-ai gen-ai

UpdatedSep 23, 2024
Python

intel /neural-compressor

Star2.4k

SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

sparsity pruning quantization knowledge-distillation auto-tuning int8 low-precision quantization-aware-training post-training-quantization awq int4 large-language-models gptq smoothquant sparsegpt fp4 mxformat

UpdatedMay 9, 2025
Python

dvmazur /mixtral-offloading

Star2.3k

Run Mixtral-8x7B models in Colab or consumer desktops

deep-learning pytorch offloading quantization language-model google-colab colab-notebook mixture-of-experts llm

UpdatedApr 8, 2024
Python

quic /aimet

Star2.3k

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

open-source machine-learning opensource deep-neural-networks compression deep-learning pruning quantization auto-ml network-quantization network-compression

UpdatedMay 10, 2025
Python

666DZY666 /micronet

Star2.2k

micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…

pytorch pruning convolutional-networks quantization xnor-net tensorrt model-compression bnn neuromorphic-computing group-convolution onnx network-in-network tensorrt-int8-python dorefa twn network-slimming integer-arithmetic-only quantization-aware-training post-training-quantization batch-normalization-fuse

UpdatedMay 6, 2025
Python

Efficient-ML /Awesome-Model-Quantization

Star2.1k

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

awesome deep-learning quantization model-compression model-acceleration binary-network binarized-neural-networks lightweight-neural-network model-quantization efficient-deep-learning

UpdatedMar 4, 2025

Improve this page

Add a description, image, and links to thequantization topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thequantization topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly