model-acceleration
Here are 27 public repositories matching this topic...
Language:All
Sort:Most stars
A curated list of neural network pruning resources.
- Updated
Apr 4, 2024
A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
- Updated
Jan 29, 2026
A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.
- Updated
Jun 19, 2021
Papers for deep neural network compression and acceleration
- Updated
Jun 21, 2021
📚 Collection of awesome generation acceleration resources.
- Updated
Jul 7, 2025
[TMLR 2026] Survey:https://arxiv.org/pdf/2507.20198
- Updated
Feb 10, 2026
📚 Collection of token-level model compression resources.
- Updated
Sep 3, 2025
[CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
- Updated
Sep 27, 2025 - Python
Resources of our survey paper "Optimizing Edge AI: A Comprehensive Survey on Data, Model, and System Strategies"
- Updated
Sep 11, 2025
[NeurIPS'24] Training-Free Adaptive Diffusion with Bounded Difference Approximation Strategy
- Updated
Jan 22, 2025 - Python
MUSCO: MUlti-Stage COmpression of neural networks
- Updated
Feb 16, 2021 - Jupyter Notebook
[NeurIPS 2025] ScaleKV: Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
- Updated
Nov 4, 2025 - Python
A list of papers, docs, codes about diffusion distillation.This repo collects various distillation methods for the Diffusion model. Welcome to PR the works (papers, repositories) missed by the repo.
- Updated
Dec 10, 2023
Deep Learning Compression and Acceleration SDK -- deep model compression for Edge and IoT embedded systems, and deep model acceleration for clouds and private servers
- Updated
Mar 17, 2018
(NeurIPS-2019 MicroNet Challenge - 3rd Winner) Open source code for "SIPA: A simple framework for efficient networks"
- Updated
Dec 18, 2022 - Python
Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
- Updated
Apr 12, 2024 - Jupyter Notebook
Bayesian Optimization-Based Global Optimal Rank Selection for Compression of Convolutional Neural Networks, IEEE Access
- Updated
Mar 21, 2021 - Python
[IJCNN'19, IEEE JSTSP'19] Caffe code for our paper "Structured Pruning for Efficient ConvNets via Incremental Regularization"; [BMVC'18] "Structured Probabilistic Pruning for Convolutional Neural Network Acceleration"
- Updated
Feb 14, 2020 - Makefile
A list of papers, docs, codes about diffusion quantization.This repo collects various quantization methods for the Diffusion Models. Welcome to PR the works (papers, repositories) missed by the repo.
- Updated
Feb 2, 2026
On Efficient Variants of Segment Anything Model
- Updated
Jul 2, 2025
Improve this page
Add a description, image, and links to themodel-acceleration topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themodel-acceleration topic, visit your repo's landing page and select "manage topics."