Movatterモバイル変換

Skip to content

#

llm-serving

Here are 109 public repositories matching this topic...

Language:All

Filter by language

All109 Python56 Jupyter Notebook14 C++5 Go5 Shell4 Rust3 TypeScript3 Dockerfile2 C#1 HTML1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

vllm-project /vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu hpu mlops xpu llm inferentia llmops llm-serving qwen deepseek trainium

UpdatedJul 18, 2025
Python

ray-project /ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

python data-science machine-learning reinforcement-learning deep-learning deployment tensorflow optimization parallel pytorch distributed hyperparameter-optimization ray hyperparameter-search serving rllib large-language-models llm llm-serving llm-inference

UpdatedJul 18, 2025
Python

liguodongiot /llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

llm llmops llm-serving llm-training llm-inference

UpdatedJul 10, 2025
HTML

sgl-project /sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer openai moe llama vlm kimi blackwell llm llm-serving llava deepseek llama3 deepseek-v3 deepseek-r1 qwen3 llama4 llama5

UpdatedJul 18, 2025
Python

bentoml /OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

UpdatedJul 14, 2025
Python

NVIDIA /TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

cuda pytorch moe blackwell llm-serving

UpdatedJul 18, 2025
C++

skypilot-org /skypilot

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

data-science machine-learning deep-learning gpu job-scheduler cloud-management spot-instances cloud-computing job-queue hyperparameter-tuning cost-optimization distributed-training multicloud ml-infrastructure tpu cost-management finops ml-platform llm-serving llm-training

UpdatedJul 18, 2025
Python

BentoML

bentoml /BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

UpdatedJul 18, 2025
Python

superduper

superduper-io /superduper

Superduper: End-to-end framework for building custom AI applications and agents.

python data database ai mongodb chatbot ml transformers inference torch pytorch pretrained-models semantic-search rag mlops distributed-ml vector-search llmops llm-serving llm-inference

UpdatedJul 16, 2025
Python

PaddlePaddle /FastDeploy

High-performance Inference and Deployment Toolkit for LLMs and VLMs based on PaddlePaddle

inference openai serving ernie llm llm-serving vllm ernie-45 ernie-45-vl

UpdatedJul 18, 2025
Python

predibase /lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

UpdatedMay 21, 2025
Python

gpustack /gpustack

Simple, scalable AI model deployment on GPU clusters

metal cuda inference openai llama maas rocm ascend llm llm-serving llamacpp vllm genai llm-inference local-ai qwen deepseek distributed-inference mindie heterogeneous-cluster

UpdatedJul 18, 2025
Python

microsoft /aici

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

UpdatedJan 22, 2025
Rust

MoonshotAI /MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

pytorch transformer moe llm llm-serving llm-training flash-attention

UpdatedApr 3, 2025
Python

ray-project /ray-llm

RayLLM - LLMs on Ray (Archived). Read README for more info.

ray llm llm-serving

UpdatedMar 13, 2025

thu-pacman /chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

gpu pytorch model-serving llm llm-serving deepseek

UpdatedJul 15, 2025
Python

zhihu /ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

cuda pytorch llama gpt inference-engine model-serving llm llm-serving llm-inference deepseek-r1

UpdatedJul 10, 2025
C++

vllm-project /vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-serving mlops ascend llm llmops llm-serving vllm

UpdatedJul 18, 2025
Python

mosec

mosecorg /mosec

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

UpdatedJul 11, 2025
Python

efeslab /Nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference model-serving llm llm-serving llama2

UpdatedJul 9, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to thellm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-serving topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2025 Movatter.jp