llm-inference
Here are 1,377 public repositories matching this topic...
Language:All
Sort:Most stars
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
- Updated
May 27, 2025 - C++
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- Updated
Dec 17, 2025 - Python
Find secrets with Gitleaks 🔑
- Updated
Dec 9, 2025 - Go
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
- Updated
Dec 3, 2025 - HTML
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
- Updated
Dec 16, 2025 - Python
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
- Updated
Dec 15, 2025 - Python
Official inference library for Mistral models
- Updated
Nov 21, 2025 - Jupyter Notebook
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
- Updated
Dec 17, 2025 - C++
High-speed Large Language Model Serving for Local Deployment
- Updated
Aug 2, 2025 - C++
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Updated
Dec 15, 2025 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
- Updated
Dec 17, 2025 - Python
Superduper: End-to-end framework for building custom AI applications and agents.
- Updated
Sep 1, 2025 - Python
Standardized Distributed Generative and Predictive AI Inference Platform for Scalable, Multi-Framework Deployment on Kubernetes
- Updated
Dec 16, 2025 - Shell
Open-source implementation of AlphaEvolve
- Updated
Dec 15, 2025 - Python
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
- Updated
Nov 28, 2025 - Python
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
- Updated
Dec 17, 2025 - TypeScript
Delivery infrastructure for agents. Arch is a models-native proxy and data plane for agents that handles plumbing work in AI - like agent routing and orchestration, guardrails, zero-code logs and traces, and unified access to LLMs (OpenAI, Anthropic, Ollama, etc.). Build agents faster and deliver them reliably to prod.
- Updated
Dec 17, 2025 - Rust
FlashInfer: Kernel Library for LLM Serving
- Updated
Dec 17, 2025 - Cuda
GPU cluster manager for optimized AI model deployment
- Updated
Dec 17, 2025 - Python
Kernels & AI inference engine for mobile devices.
- Updated
Dec 17, 2025 - C++
Improve this page
Add a description, image, and links to thellm-inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellm-inference topic, visit your repo's landing page and select "manage topics."