Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

llm-serving

Here are 127 public repositories matching this topic...

A high-throughput and memory-efficient inference and serving engine for LLMs

  • UpdatedDec 17, 2025
  • Python

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

  • UpdatedDec 3, 2025
  • HTML

SGLang is a fast serving framework for large language models and vision language models.

  • UpdatedDec 17, 2025
  • Python

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

  • UpdatedDec 17, 2025
  • Python

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

  • UpdatedDec 15, 2025
  • Python

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 20+ clouds, or on-prem).

  • UpdatedDec 17, 2025
  • Python
BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

  • UpdatedDec 15, 2025
  • Python
superduper

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

  • UpdatedDec 17, 2025
  • Python

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

  • UpdatedMay 21, 2025
  • Python

MoBA: Mixture of Block Attention for Long-Context LLMs

  • UpdatedApr 3, 2025
  • Python

Community maintained hardware plugin for vLLM on Ascend

  • UpdatedDec 17, 2025
  • Python

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

  • UpdatedDec 17, 2025
  • Python

RayLLM - LLMs on Ray (Archived). Read README for more info.

  • UpdatedMar 13, 2025

Parallax is a distributed model serving framework that lets you build your own AI cluster anywhere

  • UpdatedDec 17, 2025
  • Python

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

  • UpdatedDec 17, 2025
  • C++

A throughput-oriented high-performance serving framework for LLMs

  • UpdatedOct 29, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to thellm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-serving topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp