Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

llm-inference

Here are 1,377 public repositories matching this topic...

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

  • UpdatedMay 27, 2025
  • C++

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

  • UpdatedDec 3, 2025
  • HTML

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

  • UpdatedDec 17, 2025
  • Python

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

  • UpdatedDec 15, 2025
  • Python

Official inference library for Mistral models

  • UpdatedNov 21, 2025
  • Jupyter Notebook

High-speed Large Language Model Serving for Local Deployment

  • UpdatedAug 2, 2025
  • C++
BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

  • UpdatedDec 15, 2025
  • Python

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

  • UpdatedDec 17, 2025
  • Python
superduper

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

  • UpdatedDec 16, 2025
  • Shell
Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

  • UpdatedNov 28, 2025
  • Python

Delivery infrastructure for agents. Arch is a models-native proxy and data plane for agents that handles plumbing work in AI - like agent routing and orchestration, guardrails, zero-code logs and traces, and unified access to LLMs (OpenAI, Anthropic, Ollama, etc.). Build agents faster and deliver them reliably to prod.

  • UpdatedDec 17, 2025
  • Rust

FlashInfer: Kernel Library for LLM Serving

  • UpdatedDec 17, 2025
  • Cuda

Kernels & AI inference engine for mobile devices.

  • UpdatedDec 17, 2025
  • C++

Improve this page

Add a description, image, and links to thellm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-inference topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp