Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

High-level, optionally asynchronous Rust bindings to llama.cpp

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
NotificationsYou must be signed in to change notification settings

edgenai/llama_cpp-rs

Repository files navigation

DocumentationCrate

Safe, high-level Rust bindings to the C++ projectof the same name, meant tobe as user-friendly as possible. Run GGUF-based large language models directly on your CPU in fifteen lines of code, noML experience required!

// Create a model from anything that implements `AsRef<Path>`:let model =LlamaModel::load_from_file("path_to_model.gguf",LlamaParams::default()).expect("Could not load model");// A `LlamaModel` holds the weights shared across many _sessions_; while your model may be// several gigabytes large, a session is typically a few dozen to a hundred megabytes!letmut ctx = model.create_session(SessionParams::default()).expect("Failed to create session");// You can feed anything that implements `AsRef<[u8]>` into the model's context.ctx.advance_context("This is the story of a man named Stanley.").unwrap();// LLMs are typically used to predict the next word in a sequence. Let's generate some tokens!let max_tokens =1024;letmut decoded_tokens =0;// `ctx.start_completing_with` creates a worker thread that generates tokens. When the completion// handle is dropped, tokens stop generating!letmut completions = ctx.start_completing_with(StandardSampler::default(),1024).into_strings();for completionin completions{print!("{completion}");let _ = io::stdout().flush();        decoded_tokens +=1;if decoded_tokens > max_tokens{break;}}

This repository hosts the high-level bindings (crates/llama_cpp) as well as automatically generated bindings tollama.cpp's low-level C API (crates/llama_cpp_sys). Contributions are welcome--just keep the UX clean!

Building

Keep in mind thatllama.cpp is very computationally heavy, meaning standarddebug builds (running justcargo build/cargo run) will suffer greatly from the lack of optimisations. Therefore,unlessdebugging is really necessary, it is highly recommended to build and run using Cargo's--release flag.

Cargo Features

Several ofllama.cpp's backends are supported through features:

  • cuda - Enables the CUDA backend, the CUDA Toolkit is required for compilation if this feature is enabled.
  • vulkan - Enables the Vulkan backend, the Vulkan SDK is required for compilation if this feature is enabled.
  • metal - Enables the Metal backend, macOS only.
  • hipblas - Enables the hipBLAS/ROCm backend, ROCm is required for compilation if this feature is enabled.

Experimental

Something that's provided by these bindings is the ability to predict context size in memory, however it should benoted that this is a highly experimental feature as this isn't somethingthatllama.cpp itself provides.The returned values may be highly inaccurate, however an attempt is made to never return values lower than the realsize.

License

MIT or Apache-2.0, at your option (the "Rust" license). SeeLICENSE-MIT andLICENSE-APACHE.

About

High-level, optionally asynchronous Rust bindings to llama.cpp

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

No packages published

Contributors12


[8]ページ先頭

©2009-2025 Movatter.jp