Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

SqueezeBits

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
@SqueezeBits

SqueezeBits Inc.

We are squeezing bits.

Popular repositoriesLoading

  1. QUICKQUICKPublic

    QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference

    Python 116 5

  2. owliteowlitePublic

    OwLite is a low-code AI model compression toolkit for AI models.

    Python 43 4

  3. Torch-TRTLLMTorch-TRTLLMPublic

    Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.

    Python 30 3

  4. owlite-examplesowlite-examplesPublic

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    Python 10 1

  5. .github.githubPublic

  6. mlperf_inference_results_v4.0mlperf_inference_results_v4.0Public

    C++ 1

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 10 of 12 repositories
  • Torch-TRTLLM Public

    Ditto is an open-source framework that enables direct conversion of HuggingFace PreTrainedModels into TensorRT-LLM engines.

    SqueezeBits/Torch-TRTLLM’s past year of commit activity
    Python 30Apache-2.0 3 0 2 UpdatedMar 21, 2025
  • owlite Public

    OwLite is a low-code AI model compression toolkit for AI models.

    SqueezeBits/owlite’s past year of commit activity
    Python 43AGPL-3.0 4 0 0 UpdatedFeb 20, 2025
  • vllm-fork Public Forked fromHabanaAI/vllm-fork

    A high-throughput and memory-efficient inference and serving engine for LLMs

    SqueezeBits/vllm-fork’s past year of commit activity
    Python0Apache-2.0 6,494 0 0 UpdatedFeb 20, 2025
  • gradio Public Forked fromgradio-app/gradio

    Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

    SqueezeBits/gradio’s past year of commit activity
    Python0Apache-2.0 2,884 0 0 UpdatedJan 13, 2025
  • TensorRT-LLM Public Forked fromNVIDIA/TensorRT-LLM

    TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

    SqueezeBits/TensorRT-LLM’s past year of commit activity
    C++0Apache-2.0 1,190 0 1 UpdatedDec 12, 2024
  • SqueezeBits/vllm-hpu-extension’s past year of commit activity
    Python0Apache-2.0 25 0 0 UpdatedNov 22, 2024
  • neural-compressor Public

    Intel Neural Compressor

    SqueezeBits/neural-compressor’s past year of commit activity
    Python0Apache-2.00 0 0 UpdatedOct 22, 2024
  • owlite-examples Public

    OwLite Examples repository offers illustrative example codes to help users seamlessly compress PyTorch deep learning models and transform them into TensorRT engines.

    SqueezeBits/owlite-examples’s past year of commit activity
    Python 10 1 0 1 UpdatedSep 27, 2024
  • nvidia-dind Public Forked fromehfd/nvidia-dind

    Isolated DinD (Docker in Docker) container for developing and deploying Docker containers using NVIDIA GPUs and the NVIDIA container toolkit. Useful for deploying the Docker engine with NVIDIA in Kubernetes.

    SqueezeBits/nvidia-dind’s past year of commit activity
    Dockerfile0MPL-2.0 17 0 0 UpdatedAug 27, 2024
  • SqueezeBits/mlperf_inference_results_v4.0’s past year of commit activity
    C++0Apache-2.0 1 0 1 UpdatedJul 23, 2024

Top languages

Loading…

Most used topics

Loading…


[8]ページ先頭

©2009-2025 Movatter.jp