Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

llm-inference

Here are 1,578 public repositories matching this topic...

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

  • UpdatedMay 27, 2025
  • C++

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

  • UpdatedDec 30, 2025
  • HTML

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

  • UpdatedFeb 17, 2026
  • Python

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

  • UpdatedFeb 16, 2026
  • Python

Official inference library for Mistral models

  • UpdatedNov 21, 2025
  • Jupyter Notebook

High-speed Large Language Model Serving for Local Deployment

  • UpdatedJan 24, 2026
  • C++
BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

  • UpdatedFeb 11, 2026
  • Python

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

  • UpdatedFeb 13, 2026
  • Python
superduper

Delivery infrastructure for agentic apps - Plano is an AI-native proxy and data plane that offloads plumbing work, so you stay focused on your agent's core logic (via any AI framework).

  • UpdatedFeb 19, 2026
  • Rust

Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes

  • UpdatedFeb 20, 2026
  • Go
Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

  • UpdatedJan 18, 2026
  • Python

FlashInfer: Kernel Library for LLM Serving

  • UpdatedFeb 20, 2026
  • Python

Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.

  • UpdatedFeb 17, 2026
  • Python

Low-latency AI inference engine for mobile devices & wearables

  • UpdatedFeb 20, 2026
  • C

Improve this page

Add a description, image, and links to thellm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-inference topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2026 Movatter.jp