llm-compression
Here are 9 public repositories matching this topic...
A curated list for Efficient Large Language Models
- Updated
Mar 14, 2025 - Python
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
- Updated
Nov 25, 2024 - Python
D^2-MoE: Delta Decompression for MoE-based LLMs Compression
- Updated
Mar 3, 2025 - Python
[ICLR 2024] Jaiswal, A., Gan, Z., Du, X., Zhang, B., Wang, Z., & Yang, Y. Compressing llms: The truth is rarely pure and never simple.
- Updated
Mar 13, 2024 - Python
LLM Inference on AWS Lambda
- Updated
Jun 3, 2024 - Python
papers of llm compression
- Updated
Mar 6, 2024
[CAAI AIR'24] Minimize Quantization Output Error with Bias Compensation
- Updated
Mar 12, 2025 - Python
This repository contains the official implementation of "iShrink: Making 1B Models Even Smaller and Faster". iShrink is a structured pruning approach that effectively compresses 1B-parameter language models while maintaining their performance and improving efficiency.
- Updated
Jan 14, 2025 - Python
Improve this page
Add a description, image, and links to thellm-compression topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellm-compression topic, visit your repo's landing page and select "manage topics."