llm-inference
Here are 1,578 public repositories matching this topic...
Language:All
Sort:Most stars
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
- Updated
May 27, 2025 - C++
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- Updated
Feb 20, 2026 - Python
Find secrets with Gitleaks 🔑
- Updated
Jan 8, 2026 - Go
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
- Updated
Dec 30, 2025 - HTML
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
- Updated
Feb 17, 2026 - Python
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
- Updated
Feb 16, 2026 - Python
Official inference library for Mistral models
- Updated
Nov 21, 2025 - Jupyter Notebook
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
- Updated
Feb 20, 2026 - C++
High-speed Large Language Model Serving for Local Deployment
- Updated
Jan 24, 2026 - C++
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Updated
Feb 11, 2026 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
- Updated
Feb 13, 2026 - Python
Open-source implementation of AlphaEvolve
- Updated
Feb 4, 2026 - Python
Superduper: End-to-end framework for building custom AI applications and agents.
- Updated
Sep 1, 2025 - Python
Delivery infrastructure for agentic apps - Plano is an AI-native proxy and data plane that offloads plumbing work, so you stay focused on your agent's core logic (via any AI framework).
- Updated
Feb 19, 2026 - Rust
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
- Updated
Feb 20, 2026 - Go
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
- Updated
Jan 18, 2026 - Python
FlashInfer: Kernel Library for LLM Serving
- Updated
Feb 20, 2026 - Python
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
- Updated
Jan 14, 2026 - TypeScript
Performance-optimized AI inference on your GPUs. Unlock superior throughput by selecting and tuning engines like vLLM or SGLang.
- Updated
Feb 17, 2026 - Python
Low-latency AI inference engine for mobile devices & wearables
- Updated
Feb 20, 2026 - C
Improve this page
Add a description, image, and links to thellm-inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellm-inference topic, visit your repo's landing page and select "manage topics."