SqueezeBits Inc.

We are squeezing bits.

Popular repositoriesLoading

QUICKQUICKPublic
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
Python 116 5
owliteowlitePublic
OwLite is a low-code AI model compression toolkit for AI models.
Python 43 4
Torch-TRTLLMTorch-TRTLLMPublic
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
Python 30 3
owlite-examplesowlite-examplesPublic
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
Python 10 1
.github.githubPublic
mlperf_inference_results_v4.0mlperf_inference_results_v4.0Public
C++ 1

Repositories

Showing 10 of 12 repositories

Torch-TRTLLM Public
Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.
SqueezeBits/Torch-TRTLLM’s past year of commit activity
Python 30Apache-2.0 3 0 2 UpdatedMar 21, 2025
owlite Public
OwLite is a low-code AI model compression toolkit for AI models.
SqueezeBits/owlite’s past year of commit activity
Python 43AGPL-3.0 4 0 0 UpdatedFeb 20, 2025
vllm-fork Public Forked fromHabanaAI/vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
SqueezeBits/vllm-fork’s past year of commit activity
Python0Apache-2.0 6,494 0 0 UpdatedFeb 20, 2025
gradio Public Forked fromgradio-app/gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
SqueezeBits/gradio’s past year of commit activity
Python0Apache-2.0 2,884 0 0 UpdatedJan 13, 2025
TensorRT-LLM Public Forked fromNVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
SqueezeBits/TensorRT-LLM’s past year of commit activity
C++0Apache-2.0 1,190 0 1 UpdatedDec 12, 2024
vllm-hpu-extension Public Forked fromHabanaAI/vllm-hpu-extension
SqueezeBits/vllm-hpu-extension’s past year of commit activity
Python0Apache-2.0 25 0 0 UpdatedNov 22, 2024
neural-compressor Public
Intel Neural Compressor
SqueezeBits/neural-compressor’s past year of commit activity
Python0Apache-2.00 0 0 UpdatedOct 22, 2024
owlite-examples Public
OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.
SqueezeBits/owlite-examples’s past year of commit activity
Python 10 1 0 1 UpdatedSep 27, 2024
nvidia-dind Public Forked fromehfd/nvidia-dind
Isolated DinD (Docker in Docker) container for developing and deploying Docker containers using NVIDIA GPUs and the NVIDIA container toolkit. Useful for deploying the Docker engine with NVIDIA in Kubernetes.
SqueezeBits/nvidia-dind’s past year of commit activity
Dockerfile0MPL-2.0 17 0 0 UpdatedAug 27, 2024
mlperf_inference_results_v4.0 Public
SqueezeBits/mlperf_inference_results_v4.0’s past year of commit activity
C++0Apache-2.0 1 0 1 UpdatedJul 23, 2024