model-serving
Here are 166 public repositories matching this topic...
Language:All
Sort:Most stars
A high-throughput and memory-efficient inference and serving engine for LLMs
- Updated
Jul 19, 2025 - Python
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Updated
Jul 18, 2025 - Python
Standardized Serverless ML Inference Platform on Kubernetes
- Updated
Jul 16, 2025 - Python
In this repository, I will share some useful notes and references about deploying deep learning-based models in production.
- Updated
Nov 9, 2024
FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
- Updated
Jul 7, 2025 - Python
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
- Updated
Jul 18, 2025 - Python
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
- Updated
May 21, 2025 - Python
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSys, etc. 🗃️ Llama3, Mistral, etc. 🧑💻 Video Tutorials.
- Updated
May 24, 2025
🏕️ Reproducible development environment
- Updated
Jul 1, 2025 - Go
Olares: An Open-Source Personal Cloud to Reclaim Your Data
- Updated
Jul 19, 2025 - Go
AICI: Prompts as (Wasm) Programs
- Updated
Jan 22, 2025 - Rust
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
- Updated
Jul 18, 2025 - Python
Hopsworks - Data-Intensive AI platform with a Feature Store
- Updated
Feb 10, 2025 - Java
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.
- Updated
Jul 15, 2025 - Python
An open source DevOps tool for packaging and versioning AI/ML models, datasets, code, and configuration into an OCI artifact.
- Updated
Jul 17, 2025 - Go
The simplest way to serve AI/ML models in production
- Updated
Jul 19, 2025 - Python
A highly optimized LLM inference acceleration engine for Llama and its variants.
- Updated
Jul 10, 2025 - C++
Community maintained hardware plugin for vLLM on Ascend
- Updated
Jul 19, 2025 - Python
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
- Updated
Jul 11, 2025 - Python
A throughput-oriented high-performance serving framework for LLMs
- Updated
Jul 9, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to themodel-serving topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with themodel-serving topic, visit your repo's landing page and select "manage topics."