quantization
Here are 773 public repositories matching this topic...
Language:All
Sort:Most stars
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
- Updated
May 9, 2025 - Python
中文LLaMA&Alpaca大语言模型+本地CPU/GPU训练部署 (Chinese LLaMA & Alpaca LLMs)
- Updated
Apr 30, 2024 - Python
Faster Whisper transcription with CTranslate2
- Updated
Apr 29, 2025 - Python
[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs:https://ufund-me.github.io/Qbot ✨ :news: qbot-mini:https://github.com/Charmve/iQuant
- Updated
May 5, 2025 - Jupyter Notebook
Accessible large language models via k-bit quantization for PyTorch.
- Updated
May 9, 2025 - Python
Lossy PNG compressor — pngquant command based on libimagequant library
- Updated
Jan 23, 2025 - C
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
- Updated
Apr 11, 2025 - Python
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research.https://intellabs.github.io/distiller
- Updated
Apr 24, 2023 - Jupyter Notebook
Fast inference engine for Transformer models
- Updated
Apr 8, 2025 - C++
Sparsity-aware deep learning inference runtime for CPUs
- Updated
May 5, 2025 - Python
Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.
- Updated
Jan 22, 2024 - Python
A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks
- Updated
Nov 7, 2022 - Python
🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization tools
- Updated
May 9, 2025 - Python
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)
- Updated
Nov 22, 2022 - Python
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community:https://discord.gg/TgHXuSJEk6
- Updated
Sep 23, 2024 - Python
SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime
- Updated
May 9, 2025 - Python
Run Mixtral-8x7B models in Colab or consumer desktops
- Updated
Apr 8, 2024 - Python
AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.
- Updated
May 10, 2025 - Python
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、reg…
- Updated
May 6, 2025 - Python
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
- Updated
Mar 4, 2025
Improve this page
Add a description, image, and links to thequantization topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thequantization topic, visit your repo's landing page and select "manage topics."