Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

llm-serving

Here are 115 public repositories matching this topic...

A high-throughput and memory-efficient inference and serving engine for LLMs

  • UpdatedOct 7, 2025
  • Python

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

  • UpdatedAug 3, 2025
  • HTML

SGLang is a fast serving framework for large language models and vision language models.

  • UpdatedOct 8, 2025
  • Python

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

  • UpdatedOct 6, 2025
  • Python

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

  • UpdatedOct 7, 2025
  • C++

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 17+ clouds, or on-prem).

  • UpdatedOct 8, 2025
  • Python
BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

  • UpdatedOct 6, 2025
  • Python
superduper

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

  • UpdatedSep 30, 2025
  • Python

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

  • UpdatedMay 21, 2025
  • Python

MoBA: Mixture of Block Attention for Long-Context LLMs

  • UpdatedApr 3, 2025
  • Python

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

  • UpdatedOct 7, 2025
  • Python

RayLLM - LLMs on Ray (Archived). Read README for more info.

  • UpdatedMar 13, 2025

Community maintained hardware plugin for vLLM on Ascend

  • UpdatedOct 1, 2025
  • Python

A highly optimized LLM inference acceleration engine for Llama and its variants.

  • UpdatedJul 10, 2025
  • C++

A throughput-oriented high-performance serving framework for LLMs

  • UpdatedSep 17, 2025
  • Jupyter Notebook

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

  • UpdatedOct 6, 2025
  • C++

Improve this page

Add a description, image, and links to thellm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-serving topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp