- Notifications
You must be signed in to change notification settings - Fork58
Ready-made tokenizer library for working with GPT and tiktoken
License
zurawiki/tiktoken-rs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Rust library for tokenizing text with OpenAI models using tiktoken.
This library provides a set of ready-made tokenizer libraries for working with GPT, tiktoken and related OpenAI models. Use cases covers tokenizing and counting tokens in text inputs.
This library is built on top of thetiktoken
library and includes some additional features and enhancements for ease of use with rust code.
For full working examples for all supported features, see theexamples directory in the repository.
- Install this tool locally with
cargo
cargo add tiktoken-rs
Then in your rust code, call the API
use tiktoken_rs::o200k_base;let bpe =o200k_base().unwrap();let tokens = bpe.encode_with_special_tokens("This is a sentence with spaces");println!("Token count: {}", tokens.len());
use tiktoken_rs::{get_chat_completion_max_tokens,ChatCompletionRequestMessage};let messages =vec![ChatCompletionRequestMessage{ content:Some("You are a helpful assistant that only speaks French.".to_string()), role:"system".to_string(), name:None, function_call:None,},ChatCompletionRequestMessage{ content:Some("Hello, how are you?".to_string()), role:"user".to_string(), name:None, function_call:None,},ChatCompletionRequestMessage{ content:Some("Parlez-vous francais?".to_string()), role:"system".to_string(), name:None, function_call:None,},];let max_tokens =get_chat_completion_max_tokens("o1-mini",&messages).unwrap();println!("max_tokens: {}", max_tokens);
Counting max_tokens parameter for a chat completion request withasync-openai
Need to enable theasync-openai
feature in yourCargo.toml
file.
use tiktoken_rs::async_openai::get_chat_completion_max_tokens;use async_openai::types::{ChatCompletionRequestMessage,Role};let messages =vec![ChatCompletionRequestMessage{ content:Some("You are a helpful assistant that only speaks French.".to_string()), role:Role::System, name:None, function_call:None,},ChatCompletionRequestMessage{ content:Some("Hello, how are you?".to_string()), role:Role::User, name:None, function_call:None,},ChatCompletionRequestMessage{ content:Some("Parlez-vous francais?".to_string()), role:Role::System, name:None, function_call:None,},];let max_tokens =get_chat_completion_max_tokens("o1-mini",&messages).unwrap();println!("max_tokens: {}", max_tokens);
tiktoken
supports these encodings used by OpenAI models:
Encoding name | OpenAI models |
---|---|
o200k_base | GPT-4o models, o1 models |
cl100k_base | ChatGPT models,text-embedding-ada-002 |
p50k_base | Code models,text-davinci-002 ,text-davinci-003 |
p50k_edit | Use for edit models liketext-davinci-edit-001 ,code-davinci-edit-001 |
r50k_base (orgpt2 ) | GPT-3 models likedavinci |
See theexamples in the repo for use cases. For more context on the different tokenizers, see theOpenAI Cookbook
If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.
Thanks @spolu for the original code, and.tiktoken
files.
This project is licensed under theMIT License.
About
Ready-made tokenizer library for working with GPT and tiktoken
Topics
Resources
License
Code of conduct
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.