Containerize and run training code locally

You can use thegcloud ai custom-jobs local-run command to build a Dockercontainer image based on your training code and run the image as a container onyour local computer. This feature offers several benefits:

  • You can build a container image with minimal knowledge of Docker. You don'tneed to write your own Dockerfile. You can later push this image toArtifact Registry and use it forcustom containertraining.

    For advanced use cases, you might still want towrite your ownDockerfile.

  • Your container image can run a Python training application or a Bash script.

    You can use a Bash script to run training code written in another programminglanguage (as long as you also specify a base container image that supports theother language).

  • Running a container locally executes your training code in a similar way tohow it runs on Vertex AI.

    Running your code locally can help you debug problems withyour code before you perform serverless training onVertex AI.

Before you begin

  1. Set up your Vertex AI developmentenvironment.

  2. Install Docker Engine.

  3. If you are using Linux,configure Docker so you can run it withoutsudo.

    Thelocal-run command requires this configuration in order to use Docker.

Use thelocal-run command

Run the following command to build a container image based on your trainingcode and run a container locally:

gcloudaicustom-jobslocal-run\--executor-image-uri=BASE_IMAGE_URI\--local-package-path=WORKING_DIRECTORY\--script=SCRIPT_PATH\--output-image-uri=OUTPUT_IMAGE_NAME

Replace the following:

  • BASE_IMAGE_URI: The URI of a Docker image to use as the base of thecontainer. Choose a base image that includes dependencies required for yourtraining code.

    You can use the URI to aprebuilt trainingcontainer image or any other value thatwould be valid for aDockerfileFROMinstruction; for example, a publicly available Docker image or a Dockerimage inArtifact Registry that you haveaccess to.

  • WORKING_DIRECTORY: The lowest-level directory in your file systemthat contains all your training code and local dependencies that you need touse for training.

    By default, the command only copies the parent directory of the file specifiedby the--script flag (see the following list item) into the resulting Dockerimage. The Docker image doesnot necessarily include all the files withinWORKING_DIRECTORY; to customize which files get included, see thesection in this document aboutincluding dependencies.

    If you omit the--local-package-path flag (and this placeholder), then thelocal-run command uses the current working directory for this value.

  • SCRIPT_PATH: The path, relative toWORKING_DIRECTORY on your local filesystem, to the script that is the entry point for your training code. This can be a Python script(ending in.py) or a Bash script.

    For example, if you want to run/hello-world/trainer/task.py andWORKING_DIRECTORY is/hello-world, then usetrainer/task.pyfor this value.

    If you specify a Python script, then your base image must have Pythoninstalled, and if you specify a bash script, then your base image must haveBash installed. (All prebuilt training containers and many other publiclyavailable Docker images include both of these dependencies.)

    Use--python-module instead of--script

    If you omit the--script flag (andSCRIPT_PATH), then you mustinstead use the--python-module flag to specify the name of a Python moduleinWORKING_DIRECTORY to run as the entry point for training. Forexample, instead of--script=trainer/task.py, you might specify--python-module=trainer.task.

    In this case, the resulting Docker containerloads your code as a modulerather than as a script. You likely want to use this option if your entry point script imports otherPython modules inWORKING_DIRECTORY.

  • OUTPUT_IMAGE_NAME: A name for the resulting Docker image built bythe command. You can use any value that is accepted bydocker build's-tflag.

    If you plan to later push the image to Artifact Registry, then you might wantto usean image name that meets the Artifact Registry requirements.(Alternatively, you can tag the image with additional names later).

    If you omit the--output-image-uri flag (and this placeholder), then thelocal-run command tags the image with a name based on the current time and thefilename of your entry point script.

The command builds a Docker container image based on your configuration. Afterbuilding the image, the command prints the following output:

A training image is built.Starting to run ...

The command then immediately uses this container image to run a container onyour local computer. When the container exits, the command prints the followingoutput:

A local run is finished successfully using custom image:OUTPUT_IMAGE_NAME

Additional options

The following sections describe additional options that you can use to customizethe behavior of thelocal-run command.

Install dependencies

Your training code can rely on any dependencies installed on your base image(for example,prebuilt training containerimages include many Python libraries for machine learning), as well as any filesthat you include in the Docker image created by thelocal-run command.

When you specify a script with the--script flag or the--python-moduleflag, the command copies the script's parent directory (and its subdirectories)into the Docker image. For example, if you specify--local-package-path=/hello-world and--script=trainer/task.py, then thecommand copies/hello-world/trainer/ into the Docker image.

You can also include additional Python dependencies or arbitrary files fromyour file system by completing the extra steps described in one of the followingsections:

Install additional Python dependencies

You can include additional Python dependencies in the Docker image in severalways:

Use arequirements.txt file

If there is a file namedrequirements.txt in the working directory, then thelocal-run command treats this as apip requirementsfileand uses it to install Python dependencies in the Docker image.

Use asetup.py file

If there is a file namedsetup.py in the working directory, then thelocal-run command treats this as aPythonsetup.pyfile, copies the file to the Docker image, and runspip installon the directory in the Docker image that contains this file.

You can, for example, add aninstall_requiresargument tosetup.py in order to install Python dependencies in theDocker image.

Specify individual PyPI dependencies

You can use the--requirements flag to install specific dependencies fromPyPI in the Docker image. For example:

gcloudaicustom-jobslocal-run\--executor-image-uri=BASE_IMAGE_URI\--local-package-path=WORKING_DIRECTORY\--script=SCRIPT_PATH\--output-image-uri=OUTPUT_IMAGE_NAME\--requirements=REQUIREMENTS

ReplaceREQUIREMENTS with a comma-separated list ofPythonrequirementspecifiers.

Specify additional local Python dependencies

You can use the--extra-packages flag to install specific local Pythondependencies. These Python dependencies must be under the working directory, andeach dependency must be in a format thatpip installsupports; for example, awheelfile or aPython sourcedistribution.

For example:

gcloudaicustom-jobslocal-run\--executor-image-uri=BASE_IMAGE_URI\--local-package-path=WORKING_DIRECTORY\--script=SCRIPT_PATH\--output-image-uri=OUTPUT_IMAGE_NAME\--extra-packages=LOCAL_DEPENDENCIES

ReplaceLOCAL_DEPENDENCIES with a comma-separated list of localfile paths, expressed relative to the working directory.

Include other files

To copy additional directories to the Docker image (without installing them asPython dependencies), you can use the--extra-dirs flag. You may only specifydirectories under the working directory. For example:

gcloudaicustom-jobslocal-run\--executor-image-uri=BASE_IMAGE_URI\--local-package-path=WORKING_DIRECTORY\--script=SCRIPT_PATH\--output-image-uri=OUTPUT_IMAGE_NAME\--extra-dirs=EXTRA_DIRECTORIES

ReplaceEXTRA_DIRECTORIES with a comma-separated list of localdirectories, expressed relative to the working directory.

Training application arguments

If the entry point script for your training application expects command-linearguments, you can specify these when you run thelocal-run command. These argumentsdo not get saved in the Docker image; rathern they get passed as arguments whenthe image runs as a container.

To pass arguments to your entry point script, pass the-- argument followed byyour script's arguments to thelocal-run command after all the command's otherflags.

For example, imagine a script that you run locally with the following command:

python/hello-world/trainer/task.py\--learning_rate=0.1\--input_data=gs://BUCKET/small-dataset/

When you use thelocal-run command, you can use the following flags to run thescript in the container with the same arguments:

gcloudaicustom-jobslocal-run\\--executor-image-uri=BASE_IMAGE_URI\--local-package-path=/hello-world\--script=/trainer/task.py\--output-image-uri=OUTPUT_IMAGE_NAME\--\--learning_rate=0.1\--input_data=gs://BUCKET/small-dataset/

Accelerate model training with GPUs

If you want to eventually deploy the Docker image created by thelocal-run commandto Vertex AI anduse GPUs fortraining, then make sure towrite training code that takes advantage ofGPUs and use aGPU-enabled Docker image for the value of the--executor-image-uri flag. Forexample, you can use one of theprebuilt trainingcontainer images that supports GPUs.

If your local computer runs Linux and has GPUs, you can also configure thelocal-run command to use your GPUs when it runs a container locally. This isoptional, but it can be useful if you want to test how your training code workswith GPUs. Do the following:

  1. Install the NVIDIA Container Toolkit(nvidia-docker) on your local computer, if you haven't already.

  2. Specify the--gpu flag when you run thelocal-run command. For example:

    gcloudaicustom-jobslocal-run\--executor-image-uri=BASE_IMAGE_URI\--local-package-path=WORKING_DIRECTORY\--script=SCRIPT_PATH\--output-image-uri=OUTPUT_IMAGE_NAME\--gpu

Specify a custom service account

By default, when thelocal-run command runs your training code in a local container,it mounts the Google Cloud credentials available in your local environmentthroughApplication Default Credentials(ADC) into the container, sothat your training code can use ADC for authentication with the samecredentials. In other words, the credentials available by ADC in your localshell are also available by ADC to your code when you run thelocal-run command.

You can use thegcloud auth application-default logincommand to use your useraccount for ADC, or you canset an environment variable in your shell to use aservice account for ADC.

If you want the container to run with Google Cloud credentials other thanthose available by ADC in your local shell, do the following:

  1. Create or select a serviceaccount with the permissionsyou want your training code to have access to.

  2. Download a service accountkey for this serviceaccount to your local computer.

  3. When you run thelocal-run command, specify the--service-account-key-fileflag. For example:

    gcloudaicustom-jobslocal-run\--executor-image-uri=BASE_IMAGE_URI\--local-package-path=WORKING_DIRECTORY\--script=SCRIPT_PATH\--output-image-uri=OUTPUT_IMAGE_NAME\--service-account-key-file=KEY_PATH

    ReplaceKEY_PATH with the path to the service account key in yourlocal file system. This must be absolute or relative to the current workingdirectory of your shell,not relative to the directory specified by the--local-package-path flag.

In the resulting container, your training code can use ADC to authenticate withthe specified Service Account Credentials.

Comparison to training on Vertex AI

When you perform serverless training on Vertex AI,Vertex AI uses theVertex AI Custom Code Service Agent for yourproject by default to run your code.You can alsoattach a different serviceaccount forserverless training.

When you use thelocal-run command, you can't authenticate as the Vertex AI Custom Code Service Agent,but you can create a service account with similar permissions and use itlocally.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.