inference-server

Star

Here are 52 public repositories matching this topic...

Language:All

Filter by language

All52 Python28 C++6 Jupyter Notebook4 Rust3 TypeScript2 Go1 JavaScript1 PHP1 PowerShell1 Scala1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

Michael-A-Kuykendall /shimmy

Sponsor

Star3.5k

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

rust machine-learning transformers api-server developer-tools llama command-line-tool lora inference-server rust-crate huggingface huggingface-transformers huggingface-models llamacpp llm-inference local-ai gguf ollama-api openai-compatible

UpdatedDec 17, 2025
Rust

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

ai containers cuda intel hip hacktoberfest inference-server podman llm llamacpp vllm

UpdatedDec 17, 2025
Python

roboflow /inference

Star2.1k

Turn any computer or edge device into a command center for your computer vision projects.

python docker machine-learning computer-vision deployment inference classification object-detection vit agents inference-server jetson tensorrt instance-segmentation onnx inference-api yolov5 yolov8 yolo11 yolov12

UpdatedDec 17, 2025
Python

basetenlabs /truss

Star1.1k

The simplest way to serve AI/ML models in production

open-source machine-learning packaging artificial-intelligence falcon easy-to-use whisper inference-server model-serving inference-api stable-diffusion wizardlm

UpdatedDec 17, 2025
Python

pipeless-ai /pipeless

Star774

An open-source computer vision framework to build and deploy apps in minutes

python machine-learning cloud video computer-vision deep-learning gstreamer ffmpeg multimedia inference pipeline-framework artificial-intelligence stream-processing video-processing perception yolo object-detection inference-server vision-framework multimedia-applications

UpdatedMay 8, 2024
Rust

underneathall /pinferencia

Star549

Python + Inference - Model Deployment library in Python. Simplest model inference server ever.

python nlp data-science machine-learning ai computer-vision deep-learning tensorflow transformers inference pytorch artificial-intelligence inference-server predict paddlepaddle model-deployment model-serving serving huggingface modelserver

UpdatedFeb 14, 2023
Python

NVIDIA /gpu-rest-engine

Star420

A REST API for Caffe using Docker and Go

docker caffe deep-learning gpu inference inference-server

UpdatedJul 20, 2018
C++

BMW-InnovationLab /BMW-YOLOv4-Inference-API-GPU

Star281

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

api docker dockerfile computer-vision deep-learning neural-network gpu rest-api inference yolo deeplearning inference-server bounding-boxes no-code yolov3 alexeyab-darknet yolo-gui detection-inference-api inference-gui yolov4

UpdatedJun 28, 2022
Python

containers /podman-desktop-extension-ai-lab

Star271

Work with LLMs on a local environment using containers

ai local containers inference-server podman llms

UpdatedDec 17, 2025
TypeScript

BMW-InnovationLab /BMW-YOLOv4-Inference-API-CPU

Star220

This is a repository for an nocode object detection inference API using the Yolov4 and Yolov3 Opencv.

api docker opencv deep-neural-networks cpu computer-vision deep-learning neural-network rest-api inference object-detection inference-server bounding-boxes no-code yolov3 detection-inference-api cpu-inference-api inference-gui yolov4 yolov4-darknet

UpdatedJun 28, 2022
Python

BMW-InnovationLab /BMW-TensorFlow-Inference-API-CPU

Star183

This is a repository for an object detection inference API using the Tensorflow framework.

api docker cpu computer-vision deep-learning docker-container tensorflow rest-api docker-image inference deeplearning object-detection tensorflow-framework predictions inference-server bounding-boxes computervision docker-ce inference-engine detection-inference-api

UpdatedJun 28, 2022
Python

kibae /onnxruntime-server

Sponsor

Star178

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

machine-learning ai deep-learning cuda inference-server nueral-networks contributions-welcome onnx onnxruntime

UpdatedOct 30, 2025
C++

autodeployai /ai-serving

Star164

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

inference pmml inference-server onnx onnx-models ai-serving pmml-model onnx-inference onnx-rest pmml-deployment pmml-rest pmml-grpc onnx-grpc pmml-realtime onnx-realtime pmml-inference

UpdatedDec 17, 2025
Scala

vertexclique /orkhon

Sponsor

Star151

Orkhon: ML Inference Framework and Server Runtime

machine-learning async tensorflow multiprocessing python3 inference-server data-parallelism

UpdatedFeb 1, 2021
Rust

kf5i /k3ai

Star102

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

kubernetes artificial-intelligence edge datascience machinelearning inference-server kubeflow kubeflow-pipelines k3s

UpdatedNov 2, 2021
PowerShell

notAI-tech /fastDeploy

Star102

Deploy DL/ ML inference pipelines with minimal extra code.

python docker deep-learning websocket gunicorn pytorch falcon http-server triton gevent inference-server tensorflow-serving streaming-audio model-deployment model-serving serving tf-serving torchserve triton-inference-server triton-server

UpdatedNov 20, 2024
Python

RubixML /Server

Star61

A standalone inference server for trained Rubix ML estimators.

api infrastructure php machine-learning microservice json-api rest-api inference http-server inference-server inference-engine model-deployment php-ml ml-infrastructure model-server rubix-ml php-machine-learning rubix-server

UpdatedMar 28, 2025
PHP

friendliai /friendli-client

Star49

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

ai ml inference gpt inference-server mistral inference-engine serving mlops gpt3 llm stable-diffusion llms generative-ai llmops llm-serving llm-inference llama2 llm-ops

UpdatedJun 25, 2025
Python

curtisgray /wingman

Star43

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

windows macos linux downloader ai local download gpu chatbot inference openai gpu-acceleration llama inference-server inference-engine gpu-monitoring llm chatgpt llamacpp

UpdatedJun 2, 2024
TypeScript

k9ele7en /Triton-TensorRT-Inference-CRAFT-pytorch

Star33

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

inference pytorch text-detection nvidia-docker inference-server tensorrt inference-engine onnx onnx-torch tensorrt-conversion triton-inference-server text-detection-from-image

UpdatedAug 18, 2021
Python

Improve this page

Add a description, image, and links to theinference-server topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theinference-server topic, visit your repo's landing page and select "manage topics."

Learn more

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference-server

Here are 52 public repositories matching this topic...

Michael-A-Kuykendall /shimmy

containers /ramalama

roboflow /inference

basetenlabs /truss

pipeless-ai /pipeless

underneathall /pinferencia

NVIDIA /gpu-rest-engine

BMW-InnovationLab /BMW-YOLOv4-Inference-API-GPU

containers /podman-desktop-extension-ai-lab

BMW-InnovationLab /BMW-YOLOv4-Inference-API-CPU

BMW-InnovationLab /BMW-TensorFlow-Inference-API-CPU

kibae /onnxruntime-server

autodeployai /ai-serving

vertexclique /orkhon

kf5i /k3ai

notAI-tech /fastDeploy

RubixML /Server

friendliai /friendli-client

curtisgray /wingman

k9ele7en /Triton-TensorRT-Inference-CRAFT-pytorch

Improve this page

Add this topic to your repo