zurawiki/tiktoken-rsPublic

NotificationsYou must be signed in to change notification settings
Fork58
Star324

Ready-made tokenizer library for working with GPT and tiktoken

License

MIT license

324 stars 58 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.github/workflows		.github/workflows
scripts		scripts
tiktoken-rs		tiktoken-rs
vendor		vendor
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
renovate.json		renovate.json
rustfmt.toml		rustfmt.toml

Repository files navigation

`tiktoken-rs`

Rust library for tokenizing text with OpenAI models using tiktoken.

This library provides a set of ready-made tokenizer libraries for working with GPT, tiktoken and related OpenAI models. Use cases covers tokenizing and counting tokens in text inputs.

This library is built on top of thetiktoken library and includes some additional features and enhancements for ease of use with rust code.

Examples

For full working examples for all supported features, see theexamples directory in the repository.

Usage

Install this tool locally withcargo

cargo add tiktoken-rs

Then in your rust code, call the API

Counting token length

use tiktoken_rs::o200k_base;let bpe =o200k_base().unwrap();let tokens = bpe.encode_with_special_tokens("This is a sentence   with spaces");println!("Token count: {}", tokens.len());

Counting max_tokens parameter for a chat completion request

use tiktoken_rs::{get_chat_completion_max_tokens,ChatCompletionRequestMessage};let messages =vec![ChatCompletionRequestMessage{        content:Some("You are a helpful assistant that only speaks French.".to_string()),        role:"system".to_string(),        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Hello, how are you?".to_string()),        role:"user".to_string(),        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Parlez-vous francais?".to_string()),        role:"system".to_string(),        name:None,        function_call:None,},];let max_tokens =get_chat_completion_max_tokens("o1-mini",&messages).unwrap();println!("max_tokens: {}", max_tokens);

Counting max_tokens parameter for a chat completion request withasync-openai

Need to enable theasync-openai feature in yourCargo.toml file.

use tiktoken_rs::async_openai::get_chat_completion_max_tokens;use async_openai::types::{ChatCompletionRequestMessage,Role};let messages =vec![ChatCompletionRequestMessage{        content:Some("You are a helpful assistant that only speaks French.".to_string()),        role:Role::System,        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Hello, how are you?".to_string()),        role:Role::User,        name:None,        function_call:None,},ChatCompletionRequestMessage{        content:Some("Parlez-vous francais?".to_string()),        role:Role::System,        name:None,        function_call:None,},];let max_tokens =get_chat_completion_max_tokens("o1-mini",&messages).unwrap();println!("max_tokens: {}", max_tokens);

tiktoken supports these encodings used by OpenAI models:

Encoding name	OpenAI models
`o200k_base`	GPT-4o models, o1 models
`cl100k_base`	ChatGPT models,`text-embedding-ada-002`
`p50k_base`	Code models,`text-davinci-002`,`text-davinci-003`
`p50k_edit`	Use for edit models like`text-davinci-edit-001`,`code-davinci-edit-001`
`r50k_base` (or`gpt2`)	GPT-3 models like`davinci`

See theexamples in the repo for use cases. For more context on the different tokenizers, see theOpenAI Cookbook

Encountered any bugs?

If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.

Acknowledgements

Thanks @spolu for the original code, and.tiktoken files.

License

This project is licensed under theMIT License.

About

Ready-made tokenizer library for working with GPT and tiktoken

Code of conduct

Security policy

Activity

Stars

324 stars

Watchers

6 watching

Forks

58 forks

Report repository

Releases28

v0.7.0 Latest

May 19, 2025

+ 27 releases

Contributors18

+ 4 contributors

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

`tiktoken-rs`

Examples

Usage

Counting token length

Counting max_tokens parameter for a chat completion request

Counting max_tokens parameter for a chat completion request withasync-openai

Encountered any bugs?

Acknowledgements

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases28

Uh oh!

Contributors18

Uh oh!

Languages

Movatterモバイル変換

License

zurawiki/tiktoken-rs

Folders and files

Latest commit

History

Repository files navigation

tiktoken-rs

Examples

Usage

Counting token length

Counting max_tokens parameter for a chat completion request

Counting max_tokens parameter for a chat completion request withasync-openai

Encountered any bugs?

Acknowledgements

License

About

Topics

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases28

Uh oh!

Contributors18

Uh oh!

Languages

`tiktoken-rs`