Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

inference-server

Here are 52 public repositories matching this topic...

⚡ Python-free Rust inference server — OpenAI-API compatible. GGUF + SafeTensors, hot model swap, auto-discovery, single binary. FREE now, FREE forever.

  • UpdatedDec 17, 2025
  • Rust

RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all through the familiar language of containers.

  • UpdatedDec 17, 2025
  • Python
inferencetruss

A REST API for Caffe using Docker and Go

  • UpdatedJul 20, 2018
  • C++

This is a repository for an nocode object detection inference API using the Yolov3 and Yolov4 Darknet framework.

  • UpdatedJun 28, 2022
  • Python

Work with LLMs on a local environment using containers

  • UpdatedDec 17, 2025
  • TypeScript

ONNX Runtime Server: The ONNX Runtime Server is a server that provides TCP and HTTP/HTTPS REST APIs for ONNX inference.

  • UpdatedOct 30, 2025
  • C++

Serving AI/ML models in the open standard formats PMML and ONNX with both HTTP (REST API) and gRPC endpoints

  • UpdatedDec 17, 2025
  • Scala
orkhon

Orkhon: ML Inference Framework and Server Runtime

  • UpdatedFeb 1, 2021
  • Rust

K3ai is a lightweight, fully automated, AI infrastructure-in-a-box solution that allows anyone to experiment quickly with Kubeflow pipelines. K3ai is perfect for anything from Edge to laptops.

  • UpdatedNov 2, 2021
  • PowerShell

[⛔️ DEPRECATED] Friendli: the fastest serving engine for generative AI

  • UpdatedJun 25, 2025
  • Python
wingman

Wingman is the fastest and easiest way to run Llama models on your PC or Mac.

  • UpdatedJun 2, 2024
  • TypeScript

Advanced inference pipeline using NVIDIA Triton Inference Server for CRAFT Text detection (Pytorch), included converter from Pytorch -> ONNX -> TensorRT, Inference pipelines (TensorRT, Triton server - multi-format). Supported model format for Triton inference: TensorRT engine, Torchscript, ONNX

  • UpdatedAug 18, 2021
  • Python

Improve this page

Add a description, image, and links to theinference-server topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theinference-server topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp