Overview of creating managed datasets on Vertex AI

You can use a managed dataset to provide the source data usedto train AutoML and custom models on Vertex AI. A manageddataset is required for AutoML and is optional forcustom training.

Permissions and access control

When you use data from a Cloud Storage bucket to create a dataset, Vertex AI requires permissions to access the data. Vertex AI uses a special Google-managed service account known as a Service Agent to securely access your data. For more information on the roles required and how the Service Agent works, seeAccess control with IAM.

Create a managed dataset for AutoML models

You can create managed datasets for training AutoML models by using theGoogle Cloud console or the Vertex AI API. The instructions for how to do thisslightly vary based on your data type and model objective. Start by preparingyour training data.

Image

Learn how to create a managed dataset for the following types of imageAutoML models:

Tabular

Learn how to create a managed dataset for the following types of tabularAutoML models:

Create a managed dataset for custom trained models

The instructions on how to create a managed dataset for training custom modelsare the same, regardless of your data type or model objective.

For details, seeUse managed datasets.

View managed datasets using Dataplex Universal Catalog

Dataplex Universal Catalog is a fully managed, scalable metadatamanagement service that provides a centralized location to search for datasetsacross projects and regions. It's integrated with Vertex AI and offerssimilar capabilities to the deprecated Data Catalog.

You can use Dataplex Universal Catalog to discover, understand,and enrich your data with aspects (which are similar to Data Catalogtags).

For details on managing metadata and aspects for your Vertex AIresources, seeManage aspects and enrich metadatain theDataplex Universal Catalog.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.