Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

tensorrt-llm

Here are 30 public repositories matching this topic...

Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

  • UpdatedAug 19, 2025
  • Python

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine

  • UpdatedAug 27, 2024
  • Jupyter Notebook

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

  • UpdatedAug 2, 2025

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

  • UpdatedSep 25, 2025
  • Python

OpenAI compatible API for TensorRT LLM triton backend

  • UpdatedAug 1, 2024
  • Rust

Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming modes. It is dual-language compatible with Python and C++, offering scalability, extensibility, and high performance. It helps users quickly deploy models and provide services through HTTP/RPC interfaces.

  • UpdatedMay 8, 2025
  • C++

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

  • UpdatedMay 14, 2025
  • Python

TensorRT-LLM server with Structured Outputs (JSON) built with Rust

  • UpdatedApr 25, 2025
  • Rust

LLM-Inference-Bench

  • UpdatedJul 18, 2025
  • Jupyter Notebook

A tool for benchmarking LLMs on Modal

  • UpdatedAug 29, 2025
  • Python

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

  • UpdatedSep 26, 2024
  • C++

Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM

  • UpdatedJun 5, 2025
  • Python

Getting started with TensorRT-LLM using BLOOM as a case study

  • UpdatedMar 7, 2024
  • Jupyter Notebook

AI Infra LLM infer/ tensorrt-llm/ vllm

  • UpdatedDec 17, 2024
  • Python

大模型推理框架加速,让 LLM 飞起来

  • UpdatedMay 10, 2024
  • Python

Whisper in TensorRT-LLM

  • UpdatedSep 21, 2023
  • C++

LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.

  • UpdatedJun 26, 2025
  • Python

MiniMax-01 is a simple implementation of the MiniMax algorithm, a widely used strategy for decision-making in two-player turn-based games like Tic-Tac-Toe. The algorithm aims to minimize the maximum possible loss for the player, making it a popular choice for developing AI opponents in various game scenarios.

  • UpdatedOct 13, 2025

Improve this page

Add a description, image, and links to thetensorrt-llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thetensorrt-llm topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp