llm-inference
Here are 843 public repositories matching this topic...
Language:All
Sort:Most stars
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
- Updated
Mar 19, 2025 - C++
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
- Updated
Mar 24, 2025 - Python
Find secrets with Gitleaks 🔑
- Updated
Mar 24, 2025 - Go
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
- Updated
Mar 2, 2025 - HTML
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
- Updated
Mar 24, 2025 - Python
Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.
- Updated
Mar 24, 2025 - Python
Official inference library for Mistral models
- Updated
Mar 20, 2025 - Jupyter Notebook
High-speed Large Language Model Serving for Local Deployment
- Updated
Feb 19, 2025 - C++
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
- Updated
Mar 24, 2025 - C++
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
- Updated
Mar 24, 2025 - Python
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
- Updated
Mar 24, 2025 - Python
Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.
- Updated
Mar 24, 2025 - Jupyter Notebook
Standardized Serverless ML Inference Platform on Kubernetes
- Updated
Mar 24, 2025 - Python
📖A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, MLA, Parallelism, etc. 🎉🎉
- Updated
Mar 4, 2025
Sparsity-aware deep learning inference runtime for CPUs
- Updated
Jul 19, 2024 - Python
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
- Updated
Mar 18, 2025 - Python
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
- Updated
Mar 24, 2025 - TypeScript
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
- Updated
Mar 7, 2025 - Python
Code examples and resources for DBRX, a large language model developed by Databricks
- Updated
May 1, 2024 - Python
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
- Updated
Jun 25, 2024 - Jupyter Notebook
Improve this page
Add a description, image, and links to thellm-inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellm-inference topic, visit your repo's landing page and select "manage topics."