All Gemini 1.0 and Gemini 1.5 models are now retired.
To avoid service disruption, update to a newer model (for example,gemini-2.5-flash-lite).Learn more.

Count tokens for Gemini models

Gemini models process input and output in units calledtokens.

Tokens can be single characters likez or whole words likecat. Long wordsare broken up into several tokens. The set of all tokens used by the model iscalled the vocabulary, and the process of splitting text into tokens is calledtokenization.

ForGemini models, a token is equivalent to about 4 characters.100 tokens is equal to about 60-80 English words.

Each model has amaximum number of tokensthat it can handle in a prompt and response. Knowing the token count of yourprompt lets you know if you've exceeded this limit. Additionally, the cost of arequest is determined in part by the number of input and output tokens, soknowing how to count tokens can be helpful.

Tip: To control the number of tokens used for generating a response (andthus control costs), you can set the thinking budget(for 2.5 models only) and themaxOutputTokens (allGemini models) inthemodel's configuration.

Note thatGemini 1.0 and 1.5 models also supported a"billable characters" count and pricing, but since those models are all eitherretired or soon-to-be-retired, this page does not describe anything aboutbillable characters.

Supported models

gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.0-flash-001 (and its auto-updated aliasgemini-2.0-flash)
gemini-2.0-flash-lite-001 (and its auto-updated aliasgemini-2.0-flash-lite)
gemini-2.0-flash-preview-image-generation

Note: Although all generative models process input and output as tokens, this page and its token counting options are specific only to theGemini models listed above. Specifically, note that theGemini 2.0 Flash Live model is not supported.

ForImagen models, pricing and limits aren't based on tokens.

Options for counting tokens

All input and output for theGemini API is tokenized, including text, imagefiles, and other non-text modalities. Here are the options for counting tokens:

Check the token count for yourrequests only (before sending them to the model).

CallcountTokens with the input of the requestbefore sending it to the model. This returns:

total_tokens: token count of theinput only

Check the token count forboth your requests and responses.

Use theusageMetadata attribute on the response object. This includes:

prompt_token_count: token count of the input only
candidates_token_count: token count of the output only (does not include thinking tokens)
thoughts_token_count: token count of any thinking tokens used to generate the response
total_token_count: total count of tokens forboth the input and the output (includes any thinking tokens)

When streaming output, theusageMetadata attribute only appears on the last chunk of the stream. It'snil for intermediate chunks.

Note the following points about the options above:

They willnot count the number of input images or the number of seconds invideo or audio input files. However, the token count for each of thesemodalities willcorrelate with these values.

The input token count includes the prompt (text and any input files) aswell as any system instructions and tools.

The output token count does not include any thinking tokens; those areprovided in a separate field.

Review the additional information specific to each type of requestlater on this page.

Pricing for these options

CallingcountTokens: There's no charge for callingcountTokens(the Count Tokens API). The maximum quota for the Count Tokens API is 3000requests per minute (RPM).
Using theusageMetadata attribute: This attribute is always returned aspart of the response and doesn't incur any tokens or charge itself.

Additional information

Here's some additional information when working with specific types of requests.

Count text input tokens

No additional information.

Count multi-turn (chat) tokens

Note the following for callingcountTokens when using chat:

If you callcountTokens with the chat history, it returns the totaltoken count from both roles in the chat (total_tokens).
To understand how big your next conversational turn will be, you need toappend it to the history when you callcountTokens.

Count multimodal input tokens

Note the following points about counting tokens with multimodal input:

You can optionally callcountTokens on the text and the file separately.
For both token counting options, you'll get the same token count whetheryou provide the file as inline data or using its URL.

Image input files

Image input files are converted to tokens based on their dimensions:

Image inputs withboth dimensions less than or equal to 384 pixels: eachimage is counted as 258 tokens.
Image inputs that are larger in one or both dimensions: each image iscropped and scaled as needed into tiles of 768x768 pixels, and then eachtile is counted as 258 tokens.

Video and audio input files

Video and audio input files are converted to tokens at the following fixedrates:

Video: 263 tokens per second
Audio: 32 tokens per second

Document (like PDFs) input files

PDF input files are treated as images, so each page of a PDF is tokenized in thesame way as an image.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-03 UTC.

Movatterモバイル変換

Count tokens for Gemini models Stay organized with collections Save and categorize content based on your preferences.