post-training-quantization
Here are 56 public repositories matching this topic...
Language:All
Sort:Most stars
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
- Updated
Feb 20, 2026 - Python
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
- Updated
May 6, 2025 - Python
TinyNeuralNetwork is an efficient and easy-to-use deep learning model compression framework.
- Updated
Dec 24, 2025 - Python
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
- Updated
Aug 13, 2024 - Python
[ICCV 2023] Q-Diffusion: Quantizing Diffusion Models.
- Updated
Mar 21, 2024 - Python
[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer
- Updated
Apr 11, 2023 - Python
A model compression and acceleration toolbox based on pytorch.
- Updated
Jan 12, 2024 - Python
[NeurIPS 2024 Oral🔥] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.
- Updated
Oct 3, 2024 - Python
This repository contains notebooks that show the usage of TensorFlow Lite for quantizing deep neural networks.
- Updated
Jan 23, 2023 - Jupyter Notebook
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
- Updated
Nov 11, 2025 - C++
Notes on quantization in neural networks
- Updated
Dec 14, 2023 - Jupyter Notebook
[CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models".
- Updated
Sep 29, 2025 - Jupyter Notebook
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
- Updated
Jan 30, 2026 - Python
[ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models"
- Updated
Mar 11, 2024 - Python
Post-training static quantization using ResNet18 architecture
- Updated
Aug 1, 2020 - Jupyter Notebook
Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization
- Updated
Sep 13, 2022 - Python
[ASP-DAC 2025] "NeuronQuant: Accurate and Efficient Post-Training Quantization for Spiking Neural Networks" Official Implementation
- Updated
Mar 6, 2025 - Python
- Updated
Apr 6, 2021 - Python
Improved the performance of 8-bit PTQ4DM expecially on FID.
- Updated
Aug 30, 2023 - Python
Post-training quantization on Nvidia Nemo ASR model
- Updated
Aug 23, 2023 - Jupyter Notebook
Improve this page
Add a description, image, and links to thepost-training-quantization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thepost-training-quantization topic, visit your repo's landing page and select "manage topics."