Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

model-serving

Here are 234 public repositories matching this topic...

A high-throughput and memory-efficient inference and serving engine for LLMs

  • UpdatedFeb 20, 2026
  • Python
BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

  • UpdatedFeb 11, 2026
  • Python

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

  • UpdatedFeb 20, 2026
  • Go

In this repository, I will share some useful notes and references about deploying deep learning-based models in production.

  • UpdatedNov 9, 2024
Olares

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.

  • UpdatedOct 28, 2025
  • Python

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

  • UpdatedFeb 20, 2026
  • Python

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

  • UpdatedMay 21, 2025
  • Python

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑‍💻 Video Tutorials.

  • UpdatedJul 25, 2025

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

  • UpdatedFeb 16, 2026
  • Python

A framework for efficient model inference with omni-modality models

  • UpdatedFeb 20, 2026
  • Python

🏕️ Reproducible development environment for humans and agents

  • UpdatedFeb 9, 2026
  • Go

Community maintained hardware plugin for vLLM on Ascend

  • UpdatedFeb 14, 2026
  • C++

MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.

  • UpdatedFeb 20, 2026
  • Python
kitops

An open source DevOps tool from the CNCF for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI Artifact.

  • UpdatedFeb 18, 2026
  • Go
truss

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

  • UpdatedFeb 20, 2026
  • Cuda

A throughput-oriented high-performance serving framework for LLMs

  • UpdatedOct 29, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to themodel-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with themodel-serving topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2026 Movatter.jp