llm-inference
Here are 1,079 public repositories matching this topic...
Language:All
Sort:Most stars
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
- Updated
May 27, 2025 - C++
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- Updated
Jul 18, 2025 - Python
Find secrets with Gitleaks 🔑
- Updated
Jul 15, 2025 - Go
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
- Updated
Jul 10, 2025 - HTML
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
- Updated
Jul 18, 2025 - Python
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
- Updated
Jul 14, 2025 - Python
Official inference library for Mistral models
- Updated
Mar 20, 2025 - Jupyter Notebook
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
- Updated
Jul 18, 2025 - C++
High-speed Large Language Model Serving for Local Deployment
- Updated
Feb 19, 2025 - C++
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Updated
Jul 18, 2025 - Python
🚀 全网效果最好的移动端【实时对话数字人】。 支持本地部署、多模态交互(语音、文本、表情),响应速度低于 1.5 秒,适用于直播、教学、客服、金融、政务等对隐私与实时性要求极高的场景。开箱即用,开发者友好。
- Updated
Jul 18, 2025 - C++
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
- Updated
Jul 18, 2025 - Python
Superduper: End-to-end framework for building custom AI applications and agents.
- Updated
Jul 16, 2025 - Python
Standardized Serverless ML Inference Platform on Kubernetes
- Updated
Jul 16, 2025 - Python
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
- Updated
Jul 14, 2025 - Python
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
- Updated
Jul 12, 2025 - TypeScript
FlashInfer: Kernel Library for LLM Serving
- Updated
Jul 18, 2025 - Cuda
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
- Updated
May 21, 2025 - Python
The edge and AI gateway for agents. Arch is an intelligent proxy server that handles the low-level work in building agents like applying guardrails, routing prompts to the right agent, and unifying access to any LLM. It's a framework-agnostic infrastructure layer that helps you build production-grade agents faster.
- Updated
Jul 17, 2025 - Rust
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
- Updated
Jul 16, 2025 - Jupyter Notebook
Improve this page
Add a description, image, and links to thellm-inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellm-inference topic, visit your repo's landing page and select "manage topics."