Prepare image training data for classification

This page describes how to prepare image training data for use in aVertex AI dataset to train an image classification model.

The following objective sections include information about data requirements,the input/output schema file, and the format of the data import files(JSON Lines & CSV) that are defined by the schema.

Permissions

To use images from a Cloud Storage bucket, you must grant theVertex AI Service Agent theStorage Object Viewer role for the bucket. The Service Agent is a Google-managed service account that Vertex AI uses to access your data on your behalf. For a more detailed explanation, seeService agents.

Single-label classification

Data requirements

YAML schema file

Use the following publicly accessible schema file to import single-labelimage classification annotations. This schema file dictates the format of thedata input files. This file's structure follows theOpenAPI schema.

gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_single_label_io_format_1.0.0.yaml

Full schema file

title: ImageClassificationSingleLabeldescription: > Import and export format for importing/exporting images together with single-label classification annotation. Can be used in Dataset.import_schema_uri field.type: objectrequired:- imageGcsUriproperties:imageGcsUri:   type: string   description: >     A Cloud Storage URI pointing to an image. Up to 30MB in size.     Supported file mime types: `image/jpeg`, `image/gif`, `image/png`,     `image/webp`, `image/bmp`, `image/tiff`, `image/vnd.microsoft.icon`.classificationAnnotation:   type: object   description: Single classification Annotation on the image.   properties:displayName:       type: string       description: >         It will be imported as/exported from AnnotationSpec's display name,         i.e. the name of the label/class.annotationResourceLabels:       description: Resource labels on the Annotation.       type: object       additionalProperties:         type: stringdataItemResourceLabels:   description: Resource labels on the DataItem.   type: object   additionalProperties:     type: string

Input files

JSON Lines

JSON on each line:

{  "imageGcsUri": "gs://bucket/filename.ext",  "classificationAnnotation": {    "displayName": "LABEL",    "annotationResourceLabels": {        "aiplatform.googleapis.com/annotation_set_name": "displayName",        "env": "prod"      }   },  "dataItemResourceLabels": {    "aiplatform.googleapis.com/ml_use": "training/test/validation"  }}

Field notes:

  • imageGcsUri - The only required field.
  • annotationResourceLabels - Can contain any number of key-value string pairs. The only system-reserved key-value pair is the following:
    • "aiplatform.googleapis.com/annotation_set_name" : "value"

    Wherevalue is one of the display names of the existing annotation sets in the dataset.

  • dataItemResourceLabels - Can contain any number of key-value string pairs. The only system-reserved key-value pair is the following which specifies the machine learning use set of the data item:
    • "aiplatform.googleapis.com/ml_use" : "training/test/validation"

Example JSON Lines -image_classification_single_label.jsonl:

{"imageGcsUri": "gs://bucket/filename1.jpeg",  "classificationAnnotation": {"displayName": "daisy"}, "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "test"}}{"imageGcsUri": "gs://bucket/filename2.gif",  "classificationAnnotation": {"displayName": "dandelion"}, "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}}{"imageGcsUri": "gs://bucket/filename3.png",  "classificationAnnotation": {"displayName": "roses"}, "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}}{"imageGcsUri": "gs://bucket/filename4.bmp",  "classificationAnnotation": {"displayName": "sunflowers"}, "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}}{"imageGcsUri": "gs://bucket/filename5.tiff",  "classificationAnnotation": {"displayName": "tulips"}, "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "validation"}}...

CSV

CSV format:

[ML_USE],GCS_FILE_PATH,[LABEL]
List of columns
  • ML_USE (Optional) - For data split purposes when training a model. Use TRAINING, TEST, or VALIDATION. For more information about manual data splitting, seeAbout data splits for AutoML models.
  • GCS_FILE_PATH - This field contains the Cloud Storage URI for the image. Cloud Storage URIs are case-sensitive.
  • LABEL (Optional) - Labels must start with a letter and only contain letters, numbers, and underscores.

Example CSV -image_classification_single_label.csv:

test,gs://bucket/filename1.jpeg,daisytraining,gs://bucket/filename2.gif,dandeliongs://bucket/filename3.pnggs://bucket/filename4.bmp,sunflowersvalidation,gs://bucket/filename5.tiff,tulips...

Multi-label classification

Data requirements

YAML schema file

Use the following publicly accessible schema file to import multi-labelimage classification annotations. This schema file dictates the format of thedata input files. This file's structure follows theOpenAPI schema.

gs://google-cloud-aiplatform/schema/dataset/ioformat/image_classification_multi_label_io_format_1.0.0.yaml

Full schema file

title: ImageClassificationMultiLabeldescription: > Import and export format for importing/exporting images together with multi-label classification annotations. Can be used in Dataset.import_schema_uri field.type: objectrequired:- imageGcsUriproperties:imageGcsUri:   type: string   description: >     A Cloud Storage URI pointing to an image. Up to 30MB in size.     Supported file mime types: `image/jpeg`, `image/gif`, `image/png`,     `image/webp`, `image/bmp`, `image/tiff`, `image/vnd.microsoft.icon`.classificationAnnotations:   type: array   description: Multiple classification Annotations on the image.   items:     type: object     description: Classification annotation.     properties:displayName:         type: string         description: >           It will be imported as/exported from AnnotationSpec's display name,           i.e. the name of the label/class.annotationResourceLabels:         description: Resource labels on the Annotation.         type: object         additionalProperties:           type: stringdataItemResourceLabels:   description: Resource labels on the DataItem.   type: object   additionalProperties:     type: string

Input files

JSON Lines

JSON on each line:

{  "imageGcsUri": "gs://bucket/filename.ext",  "classificationAnnotations": [    {      "displayName": "LABEL1",      "annotationResourceLabels": {        "aiplatform.googleapis.com/annotation_set_name":"displayName",        "label_type": "flower_type"      }    },    {      "displayName": "LABEL2",      "annotationResourceLabels": {        "aiplatform.googleapis.com/annotation_set_name":"displayName",        "label_type": "image_shot_type"      }    }  ],  "dataItemResourceLabels": {    "aiplatform.googleapis.com/ml_use": "training/test/validation"  }}

Field notes:

  • imageGcsUri - The only required field.
  • annotationResourceLabels - Can contain any number of key-value string pairs. The only system-reserved key-value pair is the following:
    • "aiplatform.googleapis.com/annotation_set_name" : "value"

    Wherevalue is one of the display names of the existing annotation sets in the dataset.

  • dataItemResourceLabels - Can contain any number of key-value string pairs. The only system-reserved key-value pair is the following which specifies the machine learning use set of the data item:
    • "aiplatform.googleapis.com/ml_use" : "training/test/validation"

Example JSON Lines -image_classification_multi_label.jsonl:

{"imageGcsUri": "gs://bucket/filename1.jpeg",  "classificationAnnotations": [{"displayName": "daisy"}, {"displayName": "full_shot"}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "test"}}{"imageGcsUri": "gs://bucket/filename2.gif",  "classificationAnnotations": [{"displayName": "dandelion"}, {"displayName": "medium_shot"}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}}{"imageGcsUri": "gs://bucket/filename3.png",  "classificationAnnotations": [{"displayName": "roses"}, {"displayName": "extreme_closeup"}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}}{"imageGcsUri": "gs://bucket/filename4.bmp",  "classificationAnnotations": [{"displayName": "sunflowers"}, {"displayName": "closeup"}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "training"}}{"imageGcsUri": "gs://bucket/filename5.tiff",  "classificationAnnotations": [{"displayName": "tulips"}, {"displayName": "extreme_closeup"}], "dataItemResourceLabels": {"aiplatform.googleapis.com/ml_use": "validation"}}...

CSV

CSV format:

[ML_USE],GCS_FILE_PATH,[LABEL1,LABEL2,...LABELn]
List of columns
  • ML_USE (Optional) - For data split purposes when training a model. Use TRAINING, TEST, or VALIDATION. For more information about manual data splitting, seeAbout data splits for AutoML models.
  • GCS_FILE_PATH - This field contains the Cloud Storage URI for the image. Cloud Storage URIs are case-sensitive.
  • LABEL (Optional) - Labels must start with a letter and only contain letters, numbers, and underscores.

Example CSV -image_classification_multi_label.csv:

test,gs://bucket/filename1.jpeg,daisy,full_shottraining,gs://bucket/filename2.gif,dandelion,medium_shotgs://bucket/filename3.pnggs://bucket/filename4.bmp,sunflowers,closeupvalidation,gs://bucket/filename5.tiff,tulips,extreme_closeup...

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.