Create a dataset for training image classification models

This page shows you how to create a Vertex AI dataset from yourimage data so you can start training classification models. You can create adataset using either the Google Cloud console or the Vertex AI API.

Create an empty dataset and import or associate your data

Google Cloud console

Use the following instructions to create an empty dataset and either importor associate your data.

  1. In the Google Cloud console, in the Vertex AI section, go to theDatasets page.

    Go to the Datasets page

  2. ClickCreate to open the create dataset details page.
  3. Modify theDataset name field to create a descriptive dataset display name.
  4. Select theImage tab.
  5. Select single-label or multi-label image classification as your objective.
  6. Select a region from theRegion drop-down list.
  7. ClickCreate to create your empty dataset, and advance to the data import page.
  8. Choose one of the following options from theSelect an import method section:

    Upload data from your computer

    1. In theSelect an import method section, choose to upload data from your computer.
    2. ClickSelect files and choose all the local files to upload to a Cloud Storage bucket.
    3. In theSelect a Cloud Storage path section clickBrowse to choose a Cloud Storage bucket location to upload your data to.

    Upload an import file from your computer

    1. ClickUpload an import file from your computer.
    2. ClickSelect files and choose the local import file to upload to a Cloud Storage bucket.
    3. In theSelect a Cloud Storage path section clickBrowse to choose a Cloud Storage bucket location to upload your file to.

    Select an import file from Cloud Storage

    1. ClickSelect an import file from Cloud Storage.
    2. In theSelect a Cloud Storage path section clickBrowse to choose the import file in Cloud Storage.
  9. ClickContinue.

    Data import can take several hours, depending on the size of your data. You can close this tab and return to it later. You will receive an email when your data is imported.

API

In order to create a machine learning model you must first have arepresentative collection of data to train with. After importing data you canmake modifications and start model training.

Create a dataset

Use the following samples to create a dataset for your data.

REST

Before using any of the request data, make the following replacements:

  • LOCATION: Region where the dataset will be stored. This must be aregion that supports dataset resources. For example,us-central1.SeeList of available locations.
  • PROJECT: Yourproject ID.
  • DATASET_NAME: Name for the dataset.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets

Request JSON body:

{  "display_name": "DATASET_NAME",  "metadata_schema_uri": "gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml"}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets"

PowerShell

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets" | Select-Object -Expand Content

You should see output similar to the following. You can use theOPERATION_ID in the response toget the status of the operation.

{  "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateDatasetOperationMetadata",    "genericMetadata": {      "createTime": "2020-07-07T21:27:35.964882Z",      "updateTime": "2020-07-07T21:27:35.964882Z"    }  }}

Terraform

The following sample uses thegoogle_vertex_ai_dataset Terraform resource to create an image dataset namedimage-dataset.

To learn how to apply or remove a Terraform configuration, seeBasic Terraform commands.

resource"google_vertex_ai_dataset""image_dataset"{display_name="image-dataset"metadata_schema_uri="gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml"region="us-central1"}

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.CreateDatasetOperationMetadata;importcom.google.cloud.aiplatform.v1.Dataset;importcom.google.cloud.aiplatform.v1.DatasetServiceClient;importcom.google.cloud.aiplatform.v1.DatasetServiceSettings;importcom.google.cloud.aiplatform.v1.LocationName;importjava.io.IOException;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassCreateDatasetImageSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringdatasetDisplayName="YOUR_DATASET_DISPLAY_NAME";createDatasetImageSample(project,datasetDisplayName);}staticvoidcreateDatasetImageSample(Stringproject,StringdatasetDisplayName)throwsIOException,InterruptedException,ExecutionException,TimeoutException{DatasetServiceSettingsdatasetServiceSettings=DatasetServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(DatasetServiceClientdatasetServiceClient=DatasetServiceClient.create(datasetServiceSettings)){Stringlocation="us-central1";StringmetadataSchemaUri="gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml";LocationNamelocationName=LocationName.of(project,location);Datasetdataset=Dataset.newBuilder().setDisplayName(datasetDisplayName).setMetadataSchemaUri(metadataSchemaUri).build();OperationFuture<Dataset,CreateDatasetOperationMetadata>datasetFuture=datasetServiceClient.createDatasetAsync(locationName,dataset);System.out.format("Operation name: %s\n",datasetFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");DatasetdatasetResponse=datasetFuture.get(120,TimeUnit.SECONDS);System.out.println("Create Image Dataset Response");System.out.format("Name: %s\n",datasetResponse.getName());System.out.format("Display Name: %s\n",datasetResponse.getDisplayName());System.out.format("Metadata Schema Uri: %s\n",datasetResponse.getMetadataSchemaUri());System.out.format("Metadata: %s\n",datasetResponse.getMetadata());System.out.format("Create Time: %s\n",datasetResponse.getCreateTime());System.out.format("Update Time: %s\n",datasetResponse.getUpdateTime());System.out.format("Labels: %s\n",datasetResponse.getLabelsMap());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const datasetDisplayName = "YOUR_DATASTE_DISPLAY_NAME";// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';// Imports the Google Cloud Dataset Service Client libraryconst{DatasetServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstdatasetServiceClient=newDatasetServiceClient(clientOptions);asyncfunctioncreateDatasetImage(){// Configure the parent resourceconstparent=`projects/${project}/locations/${location}`;// Configure the dataset resourceconstdataset={displayName:datasetDisplayName,metadataSchemaUri:'gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml',};constrequest={parent,dataset,};// Create Dataset Requestconst[response]=awaitdatasetServiceClient.createDataset(request);console.log(`Long running operation:${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Create dataset image response');console.log(`Name :${result.name}`);console.log(`Display name :${result.displayName}`);console.log(`Metadata schema uri :${result.metadataSchemaUri}`);console.log(`Metadata :${JSON.stringify(result.metadata)}`);console.log(`Labels :${JSON.stringify(result.labels)}`);}createDatasetImage();

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

The following sample uses the Vertex AI SDK for Python to both create a dataset andimport data. If you run this sample code, then you can skip theImport datasection of this guide.

This particular sample imports data for single-label classification. If your model has adifferent objective, then you must adjust the code.

fromtypingimportList,Unionfromgoogle.cloudimportaiplatformdefcreate_and_import_dataset_image_sample(project:str,location:str,display_name:str,src_uris:Union[str,List[str]],sync:bool=True,):"""    src_uris -- a string or list of strings, e.g.        ["gs://bucket1/source1.jsonl", "gs://bucket7/source4.jsonl"]    """aiplatform.init(project=project,location=location)ds=aiplatform.ImageDataset.create(display_name=display_name,gcs_source=src_uris,import_schema_uri=aiplatform.schema.dataset.ioformat.image.single_label_classification,sync=sync,)ds.wait()print(ds.display_name)print(ds.resource_name)returnds

Import data

After you create an empty dataset you can import your data into the dataset.If you used the Vertex AI SDK for Python to create the dataset, then you might havealready imported data when you created the dataset. If so, you can skip thissection.

Select the tab below for your objective:

Single-label classification

REST

Before using any of the request data, make the following replacements:

  • LOCATION: Region where the dataset is located. For example,us-central1.
  • PROJECT_ID: .
  • DATASET_ID: ID of the dataset.
  • IMPORT_FILE_URI: Path to the CSV orJSON Lines file in Cloud Storage that lists data items stored in Cloud Storage to use for model training; for import file formats and limitations, seePreparing image data.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:import

Request JSON body:

{  "import_configs": [    {      "gcs_source": {        "uris": "IMPORT_FILE_URI"      },     "import_schema_uri" : "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml"    }  ]}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:import"

PowerShell

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:import" | Select-Object -Expand Content

You should see output similar to the following. You can use theOPERATION_ID in the response toget the status of the operation.

{  "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.ImportDataOperationMetadata",    "genericMetadata": {      "createTime": "2020-07-08T20:32:02.543801Z",      "updateTime": "2020-07-08T20:32:02.543801Z"    }  }}

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.DatasetName;importcom.google.cloud.aiplatform.v1.DatasetServiceClient;importcom.google.cloud.aiplatform.v1.DatasetServiceSettings;importcom.google.cloud.aiplatform.v1.GcsSource;importcom.google.cloud.aiplatform.v1.ImportDataConfig;importcom.google.cloud.aiplatform.v1.ImportDataOperationMetadata;importcom.google.cloud.aiplatform.v1.ImportDataResponse;importjava.io.IOException;importjava.util.Collections;importjava.util.List;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassImportDataImageClassificationSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringdatasetId="YOUR_DATASET_ID";StringgcsSourceUri="gs://YOUR_GCS_SOURCE_BUCKET/path_to_your_image_source/[file.csv/file.jsonl]";importDataImageClassificationSample(project,datasetId,gcsSourceUri);}staticvoidimportDataImageClassificationSample(Stringproject,StringdatasetId,StringgcsSourceUri)throwsIOException,InterruptedException,ExecutionException,TimeoutException{DatasetServiceSettingsdatasetServiceSettings=DatasetServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(DatasetServiceClientdatasetServiceClient=DatasetServiceClient.create(datasetServiceSettings)){Stringlocation="us-central1";StringimportSchemaUri="gs://google-cloud-aiplatform/schema/dataset/ioformat/"+"image_classification_single_label_io_format_1.0.0.yaml";GcsSource.BuildergcsSource=GcsSource.newBuilder();gcsSource.addUris(gcsSourceUri);DatasetNamedatasetName=DatasetName.of(project,location,datasetId);List<ImportDataConfig>importDataConfigList=Collections.singletonList(ImportDataConfig.newBuilder().setGcsSource(gcsSource).setImportSchemaUri(importSchemaUri).build());OperationFuture<ImportDataResponse,ImportDataOperationMetadata>importDataResponseFuture=datasetServiceClient.importDataAsync(datasetName,importDataConfigList);System.out.format("Operation name: %s\n",importDataResponseFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");ImportDataResponseimportDataResponse=importDataResponseFuture.get(300,TimeUnit.SECONDS);System.out.format("Import Data Image Classification Response: %s\n",importDataResponse.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const datasetId = "YOUR_DATASET_ID";// const gcsSourceUri = "YOUR_GCS_SOURCE_URI";// eg. "gs://<your-gcs-bucket>/<import_source_path>/[file.csv/file.jsonl]"// const project = "YOUR_PROJECT_ID";// const location = 'YOUR_PROJECT_LOCATION';// Imports the Google Cloud Dataset Service Client libraryconst{DatasetServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};constdatasetServiceClient=newDatasetServiceClient(clientOptions);asyncfunctionimportDataImageClassification(){constname=datasetServiceClient.datasetPath(project,location,datasetId);// Here we use only one import config with one sourceconstimportConfigs=[{gcsSource:{uris:[gcsSourceUri]},importSchemaUri:'gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml',},];constrequest={name,importConfigs,};// Create Import Data Requestconst[response]=awaitdatasetServiceClient.importData(request);console.log(`Long running operation:${response.name}`);// Wait for operation to completeconst[importDataResponse]=awaitresponse.promise();console.log(`Import data image classification response : \${JSON.stringify(importDataResponse)}`);}importDataImageClassification();

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

def image_dataset_import_data_sample(    project: str, location: str, src_uris: list, import_schema_uri: str, dataset_id: str):    aiplatform.init(project=project, location=location)    ds = aiplatform.ImageDataset(dataset_id)    ds = ds.import_data(        gcs_source=src_uris, import_schema_uri=import_schema_uri, sync=True    )    print(ds.display_name)    print(ds.name)    print(ds.resource_name)    return ds

Multi-label classification

REST

Before using any of the request data, make the following replacements:

  • LOCATION: Region where the dataset is located. For example,us-central1.
  • PROJECT_ID: .
  • DATASET_ID: ID of the dataset.
  • IMPORT_FILE_URI: Path to the CSV orJSON Lines file in Cloud Storage that lists data items stored in Cloud Storage to use for model training; for import file formats and limitations, seePreparing image data.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:import

Request JSON body:

{  "import_configs": [    {      "gcs_source": {        "uris": "IMPORT_FILE_URI"      },     "import_schema_uri" : "gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_multi_label_io_format_1.0.0.yaml"    }  ]}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:import"

PowerShell

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/datasets/DATASET_ID:import" | Select-Object -Expand Content

You should see output similar to the following. You can use theOPERATION_ID in the response toget the status of the operation.

{  "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID",  "metadata": {    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.ImportDataOperationMetadata",    "genericMetadata": {      "createTime": "2020-07-08T20:32:02.543801Z",      "updateTime": "2020-07-08T20:32:02.543801Z"    }  }}

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

def image_dataset_import_data_sample(    project: str, location: str, src_uris: list, import_schema_uri: str, dataset_id: str):    aiplatform.init(project=project, location=location)    ds = aiplatform.ImageDataset(dataset_id)    ds = ds.import_data(        gcs_source=src_uris, import_schema_uri=import_schema_uri, sync=True    )    print(ds.display_name)    print(ds.name)    print(ds.resource_name)    return ds

Get operation status

Some requests start long-running operations that require time to complete. Theserequests return an operation name, which you can use to view the operation'sstatus or cancel the operation. Vertex AI provides helper methodsto make calls against long-running operations. For more information, seeWorking with long-runningoperations.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.