Use Cloud Storage as a mounted file system

Cloud Storage FUSE lets you load training datato a Cloud Storage bucket and access that data from yourVertex AI serverless training joblike a mounted file system. Using Cloud Storage FUSE has the following benefits:

  • Training data is streamed to your training job instead of downloaded toreplicas, which can make data loading and setup tasks faster when the jobstarts running.
  • Training jobs can handle input and output at scale without making APIcalls, handling responses, or integrating with client-side libraries.
  • Cloud Storage FUSE provides high throughput for large file sequential readsand in distributed training scenarios.

Use cases

We recommend using Cloud Storage for storing training data in the followingsituations:

  • Your training data is unstructured data, such as image, text, and video.
  • Your training data is structured data in a format such as TFRecord.
  • Your training data contains large files, such as raw video.
  • You use distributed training.

How it works

Serverless training jobs can access your Cloud Storagebuckets as subdirectoriesof the root/gcs directory. For example, if your training data is located atgs://example-bucket/data.csv, you can read and write to the bucket from yourPython training application as follows:

Read to the bucket

withopen('/gcs/example-bucket/data.csv','r')asf:lines=f.readlines()

Write to the bucket

withopen('/gcs/example-bucket/epoch3.log','a')asf:f.write('success!\n')

Bucket access permissions

By default, a serverless training job can access anyCloud Storage bucketwithin the same Google Cloud project by using theVertex AI Custom Code Service Agent.To control access to buckets, you can assign acustom service accountto the job. In this case, access to a Cloud Storage bucket is granted basedon the permissions associated with the Cloud Storage roles of the customservice account.

For example, if you want to give the serverless trainingjob read and write accessto Bucket-A but only read access to Bucket-B, you can assign a custom serviceaccount that has the following roles to the job:

  • roles/storage.objectAdmin for Bucket-A
  • roles/storage.objectViewer for Bucket-B

If the training job attempts to write to Bucket-B, a "permission denied" erroris returned.

For more information on Cloud Storage roles, seeIAM roles for Cloud Storage.

Best practices

  • Avoid renaming directories. A renaming operation is not atomic inCloud Storage FUSE. If the operation is interrupted, some files remain inthe old directory.
  • Avoid unnecessarily closing (close()) or flushing files (flush()). Closingor flushing files pushes the file to Cloud Storage, which incurs a cost.

Performance optimization guidelines

To get optimal read throughput when using Cloud Storage as a file system, werecommend implementing the following guidelines:

  • To reduce the latency introduced by looking up and opening objects in abucket, store data in larger and fewer files.
  • Usedistributed training to maximizebandwidth utilization.
  • Cache frequently accessed files to improve read performance. For details, seeOverview of caching in Cloud Storage FUSE.
  • Use local storage for checkpointing and logs instead of Cloud Storage.

Limitations

To learn about the limitations of Cloud Storage FUSE, including the differencesbetween Cloud Storage FUSE and POSIX file systems, seeLimitations and differences from POSIX file systems.

Use Cloud Storage FUSE

To use Cloud Storage FUSE for serverless training, do the following:

  1. Create a Cloud Storage bucket. Note thatdual-region and multi-region buckets are not supported forserverless training.
  2. Upload your training data to the bucket. For details, seeUploads.

    To learn about other options for transferring data to Cloud Storage, seeData transfer options.

  3. Install Cloud Storage FUSE.

  4. Use the Cloud Storage file system.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.