llama-cpp
Here are 116 public repositories matching this topic...
Language:All
Sort:Most stars
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.
- Updated
Apr 29, 2025 - Dart
Run AI models locally on your machine with node.js bindings for llama.cpp. Enforce a JSON schema on the model output on the generation level
- Updated
Mar 28, 2025 - TypeScript
prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters
- Updated
Apr 28, 2025 - C++
LLama.cpp rust bindings
- Updated
Jun 27, 2024 - Rust
This repo is to showcase how you can run a model locally and offline, free of OpenAI dependencies.
- Updated
Jul 12, 2024 - Python
Local ML voice chat using high-end models.
- Updated
Apr 26, 2025 - C++
Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
- Updated
Apr 29, 2025 - Go
Making offline AI models accessible to all types of edge devices.
- Updated
Feb 12, 2024 - Dart
LLaMA Server combines the power of LLaMA C++ with the beauty of Chatbot UI.
- Updated
Jun 10, 2023 - Python
Improve this page
Add a description, image, and links to thellama-cpp topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellama-cpp topic, visit your repo's landing page and select "manage topics."