Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

llm-serving

Here are 109 public repositories matching this topic...

A high-throughput and memory-efficient inference and serving engine for LLMs

  • UpdatedJul 18, 2025
  • Python

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

  • UpdatedJul 10, 2025
  • HTML

SGLang is a fast serving framework for large language models and vision language models.

  • UpdatedJul 18, 2025
  • Python

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

  • UpdatedJul 14, 2025
  • Python

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

  • UpdatedJul 18, 2025
  • C++

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

  • UpdatedJul 18, 2025
  • Python
BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

  • UpdatedJul 18, 2025
  • Python
superduper

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

  • UpdatedJul 18, 2025
  • Python

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

  • UpdatedMay 21, 2025
  • Python

MoBA: Mixture of Block Attention for Long-Context LLMs

  • UpdatedApr 3, 2025
  • Python

RayLLM - LLMs on Ray (Archived). Read README for more info.

  • UpdatedMar 13, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

  • UpdatedJul 15, 2025
  • Python

A highly optimized LLM inference acceleration engine for Llama and its variants.

  • UpdatedJul 10, 2025
  • C++

Community maintained hardware plugin for vLLM on Ascend

  • UpdatedJul 18, 2025
  • Python
mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

  • UpdatedJul 11, 2025
  • Python

A throughput-oriented high-performance serving framework for LLMs

  • UpdatedJul 9, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to thellm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-serving topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp