Movatterモバイル変換

Skip to content

#

model-serving

Here are 234 public repositories matching this topic...

Language:All

Filter by language

All234 Python134 Jupyter Notebook30 C++9 Go9 Scala9 Java7 JavaScript4 Rust4 Shell4 TypeScript3

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

vllm-project /vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer openai moe llama gpt model-serving tpu kimi blackwell llm llm-serving qwen deepseek deepseek-v3 qwen3 gpt-oss

UpdatedFeb 20, 2026
Python

BentoML

bentoml /BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

UpdatedFeb 11, 2026
Python

kserve /kserve

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

kubernetes machine-learning tensorflow cncf pytorch artificial-intelligence xgboost k8s service-mesh hacktoberfest istio model-serving kubeflow mlops knative model-interpretability kserve vllm genai llm-inference

UpdatedFeb 20, 2026
Go

ahkarami /Deep-Learning-in-Production

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

react python angularjs flask c-plus-plus tutorial deep-neural-networks deep-learning mxnet rest-api production keras pytorch tensorflow-models tesnorflow model-serving caffe2 serving serving-pytorch-models convert-pytorch-models

UpdatedNov 9, 2024

Olares

beclab /Olares

Olares: An Open-Source Personal Cloud to Reclaim Your Data

kubernetes home-automation mcp homeserver self-hosted homelab personal-cloud ai-agents model-serving edge-ai home-server ai-privacy home-cloud local-ai

UpdatedFeb 20, 2026
Go

FedML-AI /FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

machine-learning deep-learning inference-engine model-deployment model-serving distributed-training federated-learning mlops edge-ai ai-agent on-device-training

UpdatedOct 28, 2025
Python

ModelTC /LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

nlp deep-learning llama gpt model-serving llm openai-triton

UpdatedFeb 20, 2026
Python

predibase /lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

UpdatedMay 21, 2025
Python

HuaizhengZhang /AI-Infra-from-Zero-to-Hero

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

model-serving model-training mlsys ai-infra large-language-models genai llmsys

UpdatedJul 25, 2025

thu-pacman /chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

gpu pytorch model-serving llm llm-serving deepseek

UpdatedFeb 16, 2026
Python

vllm-project /vllm-omni

A framework for efficient model inference with omni-modality models

inference pytorch transformer image-generation diffusion model-serving multimodal video-generation audio-generation

UpdatedFeb 20, 2026
Python

tensorchord /envd

🏕️ Reproducible development environment for humans and agents

agent docker developer-tools development-environment hacktoberfest codex model-serving buildkit mlops mlops-workflow llmops code-agent

UpdatedFeb 9, 2026
Go

microsoft /aici

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

UpdatedJan 22, 2025
Rust

vllm-project /vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-serving mlops ascend llm llmops llm-serving vllm

UpdatedFeb 14, 2026
C++

mlrun /mlrun

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

python kubernetes workflow data-science machine-learning data-engineering model-serving mlops experiment-tracking mlops-workflow

UpdatedFeb 20, 2026
Python

kitops

kitops-ml /kitops

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

kubernetes devops opensource ai code tensorflow sklearn models ml pytorch datasets devops-tools hacktoberfest kubernetes-deployment model-serving platform-engineering mlops model-interpretability gguf mlops-tools

UpdatedFeb 18, 2026
Go

logicalclocks /hopsworks

Hopsworks - Data-Intensive AI platform with a Feature Store

python aws data-science machine-learning serverless azure gcp ml pyspark feature-engineering governance model-serving mlops feature-store feature-management hopsworks kserve

UpdatedFeb 10, 2025
Java

truss

basetenlabs /truss

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

UpdatedFeb 20, 2026
Python

alibaba /rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

UpdatedFeb 20, 2026
Cuda

efeslab /Nanoflow

A throughput-oriented high-performance serving framework for LLMs

cuda inference model-serving llm llm-serving llama2

UpdatedOct 29, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to themodel-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with themodel-serving topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2026 Movatter.jp