Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Ready-made tokenizer library for working with GPT and tiktoken

License

NotificationsYou must be signed in to change notification settings

zurawiki/tiktoken-rs

Repository files navigation

Github ContributorsGithub StarsCI

crates.io statuscrates.io downloadsRust dependency status

Rust library for tokenizing text with OpenAI models using tiktoken.

This library provides a set of ready-made tokenizer libraries for working with GPT, tiktoken and related OpenAI models. Use cases covers tokenizing and counting tokens in text inputs.

This library is built on top of thetiktoken library and includes some additional features and enhancements for ease of use with rust code.

Examples

For full working examples for all supported features, see theexamples directory in the repository.

Usage

  1. Install this tool locally withcargo
cargo add tiktoken-rs

Then in your rust code, call the API

Counting token length

use tiktoken_rs::o200k_base;let bpe =o200k_base().unwrap();let tokens = bpe.encode_with_special_tokens("This is a sentence   with spaces");println!("Token count: {}", tokens.len());

Counting max_tokens parameter for a chat completion request

use tiktoken_rs::{get_chat_completion_max_tokens,ChatCompletionRequestMessage};let messages =vec![ChatCompletionRequestMessage{        content:Some("You are a helpful assistant that only speaks French.".to_string()),        role:"system".to_string(),        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Hello, how are you?".to_string()),        role:"user".to_string(),        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Parlez-vous francais?".to_string()),        role:"system".to_string(),        name:None,        function_call:None,},];let max_tokens =get_chat_completion_max_tokens("o1-mini",&messages).unwrap();println!("max_tokens: {}", max_tokens);

Counting max_tokens parameter for a chat completion request withasync-openai

Need to enable theasync-openai feature in yourCargo.toml file.

use tiktoken_rs::async_openai::get_chat_completion_max_tokens;use async_openai::types::{ChatCompletionRequestMessage,Role};let messages =vec![ChatCompletionRequestMessage{        content:Some("You are a helpful assistant that only speaks French.".to_string()),        role:Role::System,        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Hello, how are you?".to_string()),        role:Role::User,        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Parlez-vous francais?".to_string()),        role:Role::System,        name:None,        function_call:None,},];let max_tokens =get_chat_completion_max_tokens("o1-mini",&messages).unwrap();println!("max_tokens: {}", max_tokens);

tiktoken supports these encodings used by OpenAI models:

Encoding nameOpenAI models
o200k_baseGPT-4o models, o1 models
cl100k_baseChatGPT models,text-embedding-ada-002
p50k_baseCode models,text-davinci-002,text-davinci-003
p50k_editUse for edit models liketext-davinci-edit-001,code-davinci-edit-001
r50k_base (orgpt2)GPT-3 models likedavinci

See theexamples in the repo for use cases. For more context on the different tokenizers, see theOpenAI Cookbook

Encountered any bugs?

If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.

Acknowledgements

Thanks @spolu for the original code, and.tiktoken files.

License

This project is licensed under theMIT License.

About

Ready-made tokenizer library for working with GPT and tiktoken

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Contributors18


[8]ページ先頭

©2009-2025 Movatter.jp