To avoid service disruption, update to anewer model (for example,
gemini-2.5-flash-lite
).Learn more. Count tokens for Gemini models Stay organized with collections Save and categorize content based on your preferences.
Gemini models process input and output in units calledtokens.
Tokens can be single characters likez
or whole words likecat
. Long wordsare broken up into several tokens. The set of all tokens used by the model iscalled the vocabulary, and the process of splitting text into tokens is calledtokenization.
ForGemini models, a token is equivalent to about 4 characters.100 tokens is equal to about 60-80 English words.
Each model has amaximum number of tokensthat it can handle in a prompt and response. Knowing the token count of yourprompt lets you know if you've exceeded this limit. Additionally, the cost of arequest is determined in part by the number of input and output tokens, soknowing how to count tokens can be helpful.
Tip: To control the number of tokens used for generating a response (andthus control costs), you can set thethinking budget(for 2.5 models only) and themaxOutputTokens
(allGemini models) inthemodel's configuration.Note thatGemini 1.0 and 1.5 models also supported a"billable characters" count and pricing, but since those models are all eitherretired or soon-to-be-retired, this page does not describe anything aboutbillable characters.
Supported models
gemini-2.5-pro
gemini-2.5-flash
gemini-2.5-flash-lite
gemini-2.0-flash-001
(and its auto-updated aliasgemini-2.0-flash
)gemini-2.0-flash-lite-001
(and its auto-updated aliasgemini-2.0-flash-lite
)gemini-2.0-flash-preview-image-generation
ForImagen models, pricing and limits aren't based on tokens.
Options for counting tokens
All input and output for theGemini API is tokenized, including text, imagefiles, and other non-text modalities. Here are the options for counting tokens:
- Check the token count for yourrequests only (before sending them to the model).
- Call
countTokens
with the input of the requestbefore sending it to the model. This returns:total_tokens
: token count of theinput only
- Check the token count forboth your requests and responses.
- Use the
usageMetadata
attribute on the response object. This includes:prompt_token_count
: token count of the input onlycandidates_token_count
: token count of the output only (does not include thinking tokens)thoughts_token_count
: token count of any thinking tokens used to generate the responsetotal_token_count
: total count of tokens forboth the input and the output (includes any thinking tokens)
When streaming output, the
usageMetadata
attribute only appears on the last chunk of the stream. It'snil
for intermediate chunks.
Note the following points about the options above:
- They willnot count the number of input images or the number of seconds invideo or audio input files. However, the token count for each of thesemodalities willcorrelate with these values.
- The input token count includes the prompt (text and any input files) aswell as any system instructions and tools.
- The output token count does not include any thinking tokens; those areprovided in a separate field.
- Review theadditional information specific to each type of requestlater on this page.
Pricing for these options
Calling
countTokens
: There's no charge for callingcountTokens
(the Count Tokens API). The maximum quota for the Count Tokens API is 3000requests per minute (RPM).Using the
usageMetadata
attribute: This attribute is always returned aspart of the response and doesn't incur any tokens or charge itself.
Additional information
Here's some additional information when working with specific types of requests.
Count text input tokens
No additional information.
Count multi-turn (chat) tokens
Note the following for callingcountTokens
when using chat:
- If you call
countTokens
with the chat history, it returns the totaltoken count from both roles in the chat (total_tokens
). - To understand how big your next conversational turn will be, you need toappend it to the history when you call
countTokens
.
Count multimodal input tokens
Note the following points about counting tokens with multimodal input:
- You can optionally call
countTokens
on the text and the file separately. - For both token counting options, you'll get the same token count whetheryou provide the file as inline data or using its URL.
Image input files
Image input files are converted to tokens based on their dimensions:
- Image inputs withboth dimensions less than or equal to 384 pixels: eachimage is counted as 258 tokens.
- Image inputs that are larger in one or both dimensions: each image iscropped and scaled as needed into tiles of 768x768 pixels, and then eachtile is counted as 258 tokens.
Video and audio input files
Video and audio input files are converted to tokens at the following fixedrates:
- Video: 263 tokens per second
- Audio: 32 tokens per second
Document (like PDFs) input files
PDF input files are treated as images, so each page of a PDF is tokenized in thesame way as an image.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-10-03 UTC.