Movatterモバイル変換

Skip to content

#

llm-inference

Here are 1,079 public repositories matching this topic...

Language:All

Filter by language

All1,079 Python549 Jupyter Notebook172 TypeScript55 C++38 JavaScript38 HTML31 Go20 Rust16 Java12 C10

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

nomic-ai /gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

ai-chat llm-inference

UpdatedMay 27, 2025
C++

ray-project /ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

python data-science machine-learning reinforcement-learning deep-learning deployment tensorflow optimization parallel pytorch distributed hyperparameter-optimization ray hyperparameter-search serving rllib large-language-models llm llm-serving llm-inference

UpdatedJul 18, 2025
Python

gitleaks /gitleaks

Find secrets with Gitleaks 🔑

git go cli golang open-source security secret ci-cd cicd hacktoberfest dlp security-tools devsecops data-loss-prevention gitleaks ai-powered llm llm-training llm-inference

UpdatedJul 15, 2025
Go

liguodongiot /llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

llm llmops llm-serving llm-training llm-inference

UpdatedJul 10, 2025
HTML

Lightning-AI /litgpt

20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

UpdatedJul 18, 2025
Python

bentoml /OpenLLM

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

UpdatedJul 14, 2025
Python

mistralai /mistral-inference

Official inference library for Mistral models

llm llm-inference mistralai

UpdatedMar 20, 2025
Jupyter Notebook

openvinotoolkit /openvino

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

UpdatedJul 18, 2025
C++

SJTU-IPADS /PowerInfer

High-speed Large Language Model Serving for Local Deployment

llama large-language-models llm local-inference llm-inference

UpdatedFeb 19, 2025
C++

BentoML

bentoml /BentoML

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

UpdatedJul 18, 2025
Python

duixcom /Duix-Mobile

🚀 全网效果最好的移动端【实时对话数字人】。支持本地部署、多模态交互（语音、文本、表情），响应速度低于 1.5 秒，适用于直播、教学、客服、金融、政务等对隐私与实时性要求极高的场景。开箱即用，开发者友好。

avatar tts chat-ui edge-ai mobile-ai digital-human llm-inference ai-girlfriend ai-companion realtime-avatar ai-boyfriend

UpdatedJul 18, 2025
C++

InternLM /lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

UpdatedJul 18, 2025
Python

superduper

superduper-io /superduper

Superduper: End-to-end framework for building custom AI applications and agents.

python data database ai mongodb chatbot ml transformers inference torch pytorch pretrained-models semantic-search rag mlops distributed-ml vector-search llmops llm-serving llm-inference

UpdatedJul 16, 2025
Python

kserve /kserve

Standardized Serverless ML Inference Platform on Kubernetes

kubernetes machine-learning tensorflow sklearn pytorch artificial-intelligence xgboost k8s service-mesh hacktoberfest istio model-serving kubeflow mlops knative model-interpretability kserve genai llm-inference

UpdatedJul 16, 2025
Python

Awesome-LLM-Inference

xlite-dev /Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

UpdatedJul 14, 2025
Python

FellouAI /eko

Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai

agent workflow agents natural-language-inference browser-automation ai-agents rag computer-automation prompt-engineering chain-of-thought genai llm-inference llm-agents llmapi agentic-framework agentic-workflow agentic-ai agentic-ai-development computeruse browseruse

UpdatedJul 12, 2025
TypeScript

flashinfer-ai /flashinfer

FlashInfer: Kernel Library for LLM Serving

gpu cuda jit pytorch nvidia moe attention llm-inference large-large-models distributed-inference

UpdatedJul 18, 2025
Cuda

predibase /lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

UpdatedMay 21, 2025
Python

katanemo /archgw

The edge and AI gateway for agents. Arch is an intelligent proxy server that handles the low-level work in building agents like applying guardrails, routing prompts to the right agent, and unifying access to any LLM. It's a framework-agnostic infrastructure layer that helps you build production-grade agents faster.

proxy routing gateway prompt proxy-server openai envoy envoyproxy llms generative-ai llmops llm-inference llm-proxy ai-gateway llm-gateway llm-routing ai-gateway-support

UpdatedJul 17, 2025
Rust

NVIDIA /GenerativeAIExamples

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

microservice gpu-acceleration nemo tensorrt rag triton-inference-server large-language-models llm llm-inference retrieval-augmented-generation

UpdatedJul 16, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to thellm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-inference topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2025 Movatter.jp