Use checkpoints in Gemini model tuning

Acheckpoint is a snapshot of a model's state at a specific point in thefine-tuning process. You can use intermediate checkpoints inGemini model tuning to do the following:

  • Save tuning progress.
  • Compare the performance of intermediate checkpoints.
  • Select the best performing checkpoint before overfitting to be the defaultcheckpoint.

For tuning jobs with less than 10 epochs, one checkpoint is saved approximatelyafter each epoch. For tuning jobs with more than 10 epochs, around 10checkpoints are saved at even distribution, with the exception of the finalcheckpoint, which is saved immediately after all epochs are trained.

Intermediate checkpoints aredeployed to new endpointssequentially as tuning progresses. The tuned model endpoint represents theendpoint of the default checkpoint, and the tuned model checkpoints include allcheckpoints and their corresponding endpoints.

Note: The endpoints listed in the tuned model only include those created by thetuning job, and the deployment status of each checkpoint and the correspondingendpoint only reflects the status during tuning. If you manuallyredeploy or undeploythe tuned checkpoints after tuning, see the Model RegistryandOnline prediction console pages for updated information.

Supported models

The following Gemini models support checkpoints:

For detailed information about Gemini model versions, seeGoogle models andModel versions and lifecycle.

Note: Checkpoints are supported in the Google Gen AI SDK. They aren't supportedin the Vertex AI SDK for Python.

Create a tuning job that exports checkpoints

You can create a tuning job that exports checkpoints by usingthe Google Gen AI SDK or the Google Cloud console.

Console

To create a tuning job that exports checkpoints, go to theVertex AI Studiopage and select theTuning tab. For more information, seeTune a model.

Google Gen AI SDK

(Preview) You can configure theGen AI evaluation service to run evaluations automatically after each checkpoint. This evaluation configuration is available in theus-central1 region.

importtimefromgoogleimportgenaifromgoogle.genai.typesimportHttpOptions,CreateTuningJobConfig,TuningDataset,EvaluationConfig,OutputConfig,GcsDestination,Metric# TODO(developer): Update and un-comment below line# output_gcs_uri = "gs://your-bucket/your-prefix"client=genai.Client(http_options=HttpOptions(api_version="v1beta1"))training_dataset=TuningDataset(gcs_uri="gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_train_data.jsonl",)validation_dataset=TuningDataset(gcs_uri="gs://cloud-samples-data/ai-platform/generative_ai/gemini/text/sft_validation_data.jsonl",)evaluation_config=EvaluationConfig(metrics=[Metric(name="FLUENCY",prompt_template="""Evaluate this{prediction}""")],output_config=OutputConfig(gcs_destination=GcsDestination(output_uri_prefix=output_gcs_uri,)),)tuning_job=client.tunings.tune(base_model="gemini-2.5-flash",training_dataset=training_dataset,config=CreateTuningJobConfig(tuned_model_display_name="Example tuning job",# Set to True to disable tuning intermediate checkpoints. Default is False.export_last_checkpoint_only=False,validation_dataset=validation_dataset,evaluation_config=evaluation_config,),)running_states=set(["JOB_STATE_PENDING","JOB_STATE_RUNNING",])whiletuning_job.stateinrunning_states:print(tuning_job.state)tuning_job=client.tunings.get(name=tuning_job.name)time.sleep(60)print(tuning_job.tuned_model.model)print(tuning_job.tuned_model.endpoint)print(tuning_job.experiment)# Example response:# projects/123456789012/locations/us-central1/models/1234567890@1# projects/123456789012/locations/us-central1/endpoints/123456789012345# projects/123456789012/locations/us-central1/metadataStores/default/contexts/tuning-experiment-2025010112345678iftuning_job.tuned_model.checkpoints:fori,checkpointinenumerate(tuning_job.tuned_model.checkpoints):print(f"Checkpoint{i+1}: ",checkpoint)# Example response:# Checkpoint 1:  checkpoint_id='1' epoch=1 step=10 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789000000'# Checkpoint 2:  checkpoint_id='2' epoch=2 step=20 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789012345'

List the checkpoints for a tuning job

You can view the checkpoints for your completed tuning job in theGoogle Cloud console or list them by using the Google Gen AI SDK.

If intermediate checkpoints are disabled, only the final checkpoint is displayedor returned.

Console

  1. To locate your tuned model in the Google Cloud console, go to theVertex AI Studio page.

    Go toVertex AI Studio

  2. In theTuning tab, find your model and clickMonitor.

    The tuning metrics and checkpoints of your model are shown. In each metricsgraph, checkpoint numbers are displayed as annotations as follows:

    • For each epoch, you see a step number and an epoch number.
    • The step number is the exact step when a checkpoint is saved.
    • The epoch number is an estimated epoch number that the checkpointbelongs to, except for the final checkpoint for a completed tuning job,which has the exact epoch number.

Google Gen AI SDK

fromgoogleimportgenaifromgoogle.genai.typesimportHttpOptionsclient=genai.Client(http_options=HttpOptions(api_version="v1"))# Get the tuning job and the tuned model.# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"tuning_job=client.tunings.get(name=tuning_job_name)iftuning_job.tuned_model.checkpoints:fori,checkpointinenumerate(tuning_job.tuned_model.checkpoints):print(f"Checkpoint{i+1}: ",checkpoint)# Example response:# Checkpoint 1:  checkpoint_id='1' epoch=1 step=10 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789000000'# Checkpoint 2:  checkpoint_id='2' epoch=2 step=20 endpoint='projects/123456789012/locations/us-central1/endpoints/123456789012345'

View model details and checkpoints

You can view the your tuned model in the Google Cloud console or use theGoogle Gen AI SDK to get model details, including endpoints and checkpoints.

TheEndpoint field of the model is updated as follows:

  • It's updated based on the default checkpoint, and represents the endpointthat the tuning job created for the updated default checkpoint duringtuning.
  • If a model isn't present, or if the tuning job fails to get a model, theEndpoint value is empty.
  • If the default checkpoint isn't deployed (because tuning is still in progressor because deployment has failed), theEndpoint value is empty.

Console

You can view your tuned model in the Vertex AI Model Registry in theOnline predictionEndpoints page.

  1. Go to theModel Registry page from the Vertex AI sectionin the Google Cloud console.

    Go to the Model Registry page

  2. Click the name of your model.

    The default version of your model appears.

  3. Click theVersion details tab to see information about your modelversion.

    Note that theObjective isLarge model, theModel type isFoundation, and theSource isVertex AI Studio tuning.

  4. Click theDeploy & test tab to see the endpoint where the model isdeployed.

  5. Click the endpoint name to go to theEndpoint page to see the list ofcheckpoints that are deployed to the endpoint. For each checkpoint, themodel version ID and checkpoint ID are displayed. The default checkpointis indicated by the worddefault next to the checkpoint ID.

Alternatively, the checkpoints can also be viewed in theTuning Job Details page. To see this page, go to theTuning page andclick one of the tuning jobs.

Go to the Tuning page

Google Gen AI SDK

fromgoogleimportgenaifromgoogle.genai.typesimportHttpOptionsclient=genai.Client(http_options=HttpOptions(api_version="v1"))# Get the tuning job and the tuned model.# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"tuning_job=client.tunings.get(name=tuning_job_name)tuned_model=client.models.get(model=tuning_job.tuned_model.model)print(tuned_model)# Example response:# Model(name='projects/123456789012/locations/us-central1/models/1234567890@1', ...)print(f"Default checkpoint:{tuned_model.default_checkpoint_id}")# Example response:# Default checkpoint: 2iftuned_model.checkpoints:for_,checkpointinenumerate(tuned_model.checkpoints):print(f"Checkpoint{checkpoint.checkpoint_id}: ",checkpoint)# Example response:# Checkpoint 1:  checkpoint_id='1' epoch=1 step=10# Checkpoint 2:  checkpoint_id='2' epoch=2 step=20

If you configured the Gen AI evaluation service to run evaluations after each checkpoint, view the Cloud Storage bucket you configured for evaluation results.

Test the checkpoints

You can view a list of checkpoints in the Vertex AI Model Registryand test each one. Or you can use the Google Gen AI SDK to list and test yourcheckpoints.

Console

  1. To locate your tuned model in the Google Cloud console, go to theVertex AI Studio page.

    Go toVertex AI Studio

  2. In theTuning tab, find your model and clickMonitor.

  3. In the checkpoint table in theMonitor pane, next to the desiredcheckpoint, click theTest link.

Google Gen AI SDK

fromgoogleimportgenaifromgoogle.genai.typesimportHttpOptionsclient=genai.Client(http_options=HttpOptions(api_version="v1"))# Get the tuning job and the tuned model.# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"tuning_job=client.tunings.get(name=tuning_job_name)contents="Why is the sky blue?"# Predicts with the default checkpoint.response=client.models.generate_content(model=tuning_job.tuned_model.endpoint,contents=contents,)print(response.text)# Example response:# The sky is blue because ...# Predicts with Checkpoint 1.checkpoint1_response=client.models.generate_content(model=tuning_job.tuned_model.checkpoints[0].endpoint,contents=contents,)print(checkpoint1_response.text)# Example response:# The sky is blue because ...# Predicts with Checkpoint 2.checkpoint2_response=client.models.generate_content(model=tuning_job.tuned_model.checkpoints[1].endpoint,contents=contents,)print(checkpoint2_response.text)# Example response:# The sky is blue because ...

Select a new default checkpoint

You can use the default checkpoint to represent the best performing checkpoint.By default, the default checkpoint is the final checkpoint of a tuning job.

When deploying a model with checkpoints, the default checkpoint is deployed.

When copying a model with checkpoints, the destination model would have the samedefault checkpoint ID as the source model. All checkpoints are copied, so youcan select a new default checkpoint for the destination model.

The tuning job endpoint will be updated if you update a default checkpoint, andyou can use the new endpoint for inference.

Console

  1. To locate your tuned model in the Google Cloud console, go to theVertex AI Studio page.

    Go toVertex AI Studio

  2. In theTuning tab, find your model and clickMonitor.

  3. In the checkpoint table in theMonitor pane, next to the desiredcheckpoint, clickActionsand selectSet as default.

  4. ClickConfirm.

    The metrics graphs and checkpoint table are updated to show the new defaultcheckpoint. The endpoint in the TuningJob details page is updated to showthe Endpoint of the new default checkpoint.

Google Gen AI SDK

fromgoogleimportgenaifromgoogle.genai.typesimportHttpOptions,UpdateModelConfigclient=genai.Client(http_options=HttpOptions(api_version="v1"))# Get the tuning job and the tuned model.# Eg. tuning_job_name = "projects/123456789012/locations/us-central1/tuningJobs/123456789012345"tuning_job=client.tunings.get(name=tuning_job_name)tuned_model=client.models.get(model=tuning_job.tuned_model.model)print(f"Default checkpoint:{tuned_model.default_checkpoint_id}")print(f"Tuned model endpoint:{tuning_job.tuned_model.endpoint}")# Example response:# Default checkpoint: 2# projects/123456789012/locations/us-central1/endpoints/123456789012345# Set a new default checkpoint.# Eg. checkpoint_id = "1"tuned_model=client.models.update(model=tuned_model.name,config=UpdateModelConfig(default_checkpoint_id=checkpoint_id),)print(f"Default checkpoint:{tuned_model.default_checkpoint_id}")print(f"Tuned model endpoint:{tuning_job.tuned_model.endpoint}")# Example response:# Default checkpoint: 1# projects/123456789012/locations/us-central1/endpoints/123456789000000

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.