Create a dataset for training forecast models Stay organized with collections Save and categorize content based on your preferences.
This page shows you how to create a Vertex AI dataset from yourtabular data so you can start training forecast models. You can create adataset using either the Google Cloud console or the Vertex AI API.
Before you begin
Before creating a Vertex AI dataset from your tabular data,prepare yourtraining data.
Create an empty dataset and associate your prepared data
To create a machine learning model for forecasting, you must have arepresentative collection of data to train with. Use the Google Cloud consoleor the API to associate your prepared data into the dataset.
When you create a dataset, you also associate it with its data source. Thetraining data can be either a CSV file in Cloud Storage ora table in BigQuery. If the data sourceresides in a different project, make sure youset up the required permissions.
Google Cloud console
- In the Google Cloud console, in the Vertex AI section, go to theDatasets page.
- ClickCreate to open the create dataset details page.
- Modify theDataset name field to create a descriptive dataset display name.
- Select theTabular tab.
- Select theForecasting objective.
- Select a region from theRegion drop-down list.
- ClickCreate to create your empty dataset, and advance to theSource tab.
- Choose one of the following options, based on your data source.
CSV files on your computer
- ClickUpload CSV files from your computer.
- ClickSelect files and choose all the local files to upload to a Cloud Storage bucket.
- In theSelect a Cloud Storage path section enter the path to the Cloud Storage bucket or clickBrowse to choose a bucket location.
CSV files in Cloud Storage
- ClickSelect CSV files from Cloud Storage.
- In theSelect CSV files from Cloud Storage section enter the path to the Cloud Storage bucket or clickBrowse to choose the location of your CSV files.
A table or view in BigQuery
- ClickSelect a table or view from BigQuery.
- Enter the project, dataset, and table IDs for your input file.
- ClickContinue.
Your data source is associated with your dataset.
- On theAnalyze tab, specify theTimestamp column and theSeries identifier column for this dataset.
You can also specify these columns when you train your model, but generally a forecasting dataset has specific Time and Time-series identifier columns, so specifying them in the dataset is a best practice.
API : CSV
REST
You use thedatasets.create method to create a dataset.
Before using any of the request data, make the following replacements:
- LOCATION: Region where the dataset will be stored. This must be aregion that supportsdataset resources. For example,
us-central1. - PROJECT: Yourproject ID.
- DATASET_NAME: Display name for the dataset.
- METADATA_SCHEMA_URI: The URI to the schema file for your objective.
gs://google-cloud-aiplatform/schema/dataset/metadata/time_series_1.0.0.yaml - URI: Paths (URIs) to the Cloud Storage buckets containing the training data. There can be more than one. Each URI has the form:
gs://GCSprojectId/bucketName/fileName
- PROJECT_NUMBER: Your project's automatically generatedproject number.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets
Request JSON body:
{ "display_name": "DATASET_NAME", "metadata_schema_uri": "METADATA_SCHEMA_URI", "metadata": { "input_config": { "gcs_source": { "uri": [URI1,URI2, ...] } } }}To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateDatasetOperationMetadata", "genericMetadata": { "createTime": "2020-07-07T21:27:35.964882Z", "updateTime": "2020-07-07T21:27:35.964882Z" }}Java
Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.CreateDatasetOperationMetadata;importcom.google.cloud.aiplatform.v1.Dataset;importcom.google.cloud.aiplatform.v1.DatasetServiceClient;importcom.google.cloud.aiplatform.v1.DatasetServiceSettings;importcom.google.cloud.aiplatform.v1.LocationName;importcom.google.protobuf.Value;importcom.google.protobuf.util.JsonFormat;importjava.io.IOException;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassCreateDatasetTabularGcsSample{publicstaticvoidmain(String[]args)throwsInterruptedException,ExecutionException,TimeoutException,IOException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringdatasetDisplayName="YOUR_DATASET_DISPLAY_NAME";StringgcsSourceUri="gs://YOUR_GCS_SOURCE_BUCKET/path_to_your_gcs_table/file.csv";;createDatasetTableGcs(project,datasetDisplayName,gcsSourceUri);}staticvoidcreateDatasetTableGcs(Stringproject,StringdatasetDisplayName,StringgcsSourceUri)throwsIOException,ExecutionException,InterruptedException,TimeoutException{DatasetServiceSettingssettings=DatasetServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(DatasetServiceClientdatasetServiceClient=DatasetServiceClient.create(settings)){Stringlocation="us-central1";StringmetadataSchemaUri="gs://google-cloud-aiplatform/schema/dataset/metadata/tables_1.0.0.yaml";LocationNamelocationName=LocationName.of(project,location);StringjsonString="{\"input_config\": {\"gcs_source\": {\"uri\": [\""+gcsSourceUri+"\"]}}}";Value.BuildermetaData=Value.newBuilder();JsonFormat.parser().merge(jsonString,metaData);Datasetdataset=Dataset.newBuilder().setDisplayName(datasetDisplayName).setMetadataSchemaUri(metadataSchemaUri).setMetadata(metaData).build();OperationFuture<Dataset,CreateDatasetOperationMetadata>datasetFuture=datasetServiceClient.createDatasetAsync(locationName,dataset);System.out.format("Operation name: %s\n",datasetFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");DatasetdatasetResponse=datasetFuture.get(300,TimeUnit.SECONDS);System.out.println("Create Dataset Table GCS sample");System.out.format("Name: %s\n",datasetResponse.getName());System.out.format("Display Name: %s\n",datasetResponse.getDisplayName());System.out.format("Metadata Schema Uri: %s\n",datasetResponse.getMetadataSchemaUri());System.out.format("Metadata: %s\n",datasetResponse.getMetadata());}}}Node.js
Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const datasetDisplayName = 'YOUR_DATASET_DISPLAY_NAME';// const gcsSourceUri = 'YOUR_GCS_SOURCE_URI';// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';// Imports the Google Cloud Dataset Service Client libraryconst{DatasetServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstdatasetServiceClient=newDatasetServiceClient(clientOptions);asyncfunctioncreateDatasetTabularGcs(){// Configure the parent resourceconstparent=`projects/${project}/locations/${location}`;constmetadata={structValue:{fields:{inputConfig:{structValue:{fields:{gcsSource:{structValue:{fields:{uri:{listValue:{values:[{stringValue:gcsSourceUri}],},},},},},},},},},},};// Configure the dataset resourceconstdataset={displayName:datasetDisplayName,metadataSchemaUri:'gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml',metadata:metadata,};constrequest={parent,dataset,};// Create dataset requestconst[response]=awaitdatasetServiceClient.createDataset(request);console.log(`Long running operation :${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Create dataset tabular gcs response');console.log(`\tName :${result.name}`);console.log(`\tDisplay name :${result.displayName}`);console.log(`\tMetadata schema uri :${result.metadataSchemaUri}`);console.log(`\tMetadata :${JSON.stringify(result.metadata)}`);}createDatasetTabularGcs();Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
defcreate_and_import_dataset_time_series_gcs_sample(display_name:str,project:str,location:str,gcs_source:Union[str, List[str]],):aiplatform.init(project=project,location=location)dataset=aiplatform.TimeSeriesDataset.create(display_name=display_name,gcs_source=gcs_source,)dataset.wait()print(f'\tDataset: "{dataset.display_name}"')print(f'\tname: "{dataset.resource_name}"')API : BigQuery
REST
You use thedatasets.create method to create a dataset.
Before using any of the request data, make the following replacements:
- LOCATION: Region where the dataset will be stored. This must be aregion that supportsdataset resources. For example,
us-central1. - PROJECT: .
- DATASET_NAME: Display name for the dataset.
- METADATA_SCHEMA_URI: The URI to the schema file for your objective.
gs://google-cloud-aiplatform/schema/dataset/metadata/time_series_1.0.0.yaml - URI: Path to the BigQuery table containing the training data. In the form:
bq://bqprojectId.bqDatasetId.bqTableId
- PROJECT_NUMBER: Your project's automatically generatedproject number.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets
Request JSON body:
{ "display_name": "DATASET_NAME", "metadata_schema_uri": "METADATA_SCHEMA_URI", "metadata": { "input_config": { "bigquery_source" :{ "uri": "URI } } }}To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/datasets" | Select-Object -Expand Content
You should receive a JSON response similar to the following:
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION/datasets/DATASET_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateDatasetOperationMetadata", "genericMetadata": { "createTime": "2020-07-07T21:27:35.964882Z", "updateTime": "2020-07-07T21:27:35.964882Z" }}Java
Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.CreateDatasetOperationMetadata;importcom.google.cloud.aiplatform.v1.Dataset;importcom.google.cloud.aiplatform.v1.DatasetServiceClient;importcom.google.cloud.aiplatform.v1.DatasetServiceSettings;importcom.google.cloud.aiplatform.v1.LocationName;importcom.google.protobuf.Value;importcom.google.protobuf.util.JsonFormat;importjava.io.IOException;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassCreateDatasetTabularBigquerySample{publicstaticvoidmain(String[]args)throwsInterruptedException,ExecutionException,TimeoutException,IOException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringbigqueryDisplayName="YOUR_DATASET_DISPLAY_NAME";StringbigqueryUri="bq://YOUR_GOOGLE_CLOUD_PROJECT_ID.BIGQUERY_DATASET_ID.BIGQUERY_TABLE_OR_VIEW_ID";createDatasetTableBigquery(project,bigqueryDisplayName,bigqueryUri);}staticvoidcreateDatasetTableBigquery(Stringproject,StringbigqueryDisplayName,StringbigqueryUri)throwsIOException,ExecutionException,InterruptedException,TimeoutException{DatasetServiceSettingssettings=DatasetServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(DatasetServiceClientdatasetServiceClient=DatasetServiceClient.create(settings)){Stringlocation="us-central1";StringmetadataSchemaUri="gs://google-cloud-aiplatform/schema/dataset/metadata/tables_1.0.0.yaml";LocationNamelocationName=LocationName.of(project,location);StringjsonString="{\"input_config\": {\"bigquery_source\": {\"uri\": \""+bigqueryUri+"\"}}}";Value.BuildermetaData=Value.newBuilder();JsonFormat.parser().merge(jsonString,metaData);Datasetdataset=Dataset.newBuilder().setDisplayName(bigqueryDisplayName).setMetadataSchemaUri(metadataSchemaUri).setMetadata(metaData).build();OperationFuture<Dataset,CreateDatasetOperationMetadata>datasetFuture=datasetServiceClient.createDatasetAsync(locationName,dataset);System.out.format("Operation name: %s\n",datasetFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");DatasetdatasetResponse=datasetFuture.get(300,TimeUnit.SECONDS);System.out.println("Create Dataset Table Bigquery sample");System.out.format("Name: %s\n",datasetResponse.getName());System.out.format("Display Name: %s\n",datasetResponse.getDisplayName());System.out.format("Metadata Schema Uri: %s\n",datasetResponse.getMetadataSchemaUri());System.out.format("Metadata: %s\n",datasetResponse.getMetadata());}}}Node.js
Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const datasetDisplayName = 'YOUR_DATASET_DISPLAY_NAME';// const bigquerySourceUri = 'YOUR_BIGQUERY_SOURCE_URI';// const project = 'YOUR_PROJECT_ID';// const location = 'YOUR_PROJECT_LOCATION';// Imports the Google Cloud Dataset Service Client libraryconst{DatasetServiceClient}=require('@google-cloud/aiplatform');// Specifies the location of the api endpointconstclientOptions={apiEndpoint:'us-central1-aiplatform.googleapis.com',};// Instantiates a clientconstdatasetServiceClient=newDatasetServiceClient(clientOptions);asyncfunctioncreateDatasetTabularBigquery(){// Configure the parent resourceconstparent=`projects/${project}/locations/${location}`;constmetadata={structValue:{fields:{inputConfig:{structValue:{fields:{bigquerySource:{structValue:{fields:{uri:{listValue:{values:[{stringValue:bigquerySourceUri}],},},},},},},},},},},};// Configure the dataset resourceconstdataset={displayName:datasetDisplayName,metadataSchemaUri:'gs://google-cloud-aiplatform/schema/dataset/metadata/tabular_1.0.0.yaml',metadata:metadata,};constrequest={parent,dataset,};// Create dataset requestconst[response]=awaitdatasetServiceClient.createDataset(request);console.log(`Long running operation :${response.name}`);// Wait for operation to completeawaitresponse.promise();constresult=response.result;console.log('Create dataset tabular bigquery response');console.log(`\tName :${result.name}`);console.log(`\tDisplay name :${result.displayName}`);console.log(`\tMetadata schema uri :${result.metadataSchemaUri}`);console.log(`\tMetadata :${JSON.stringify(result.metadata)}`);}createDatasetTabularBigquery();Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
def create_and_import_dataset_time_series_bigquery_sample( display_name: str, project: str, location: str, bigquery_source: str,): aiplatform.init(project=project, location=location) dataset = aiplatform.TimeSeriesDataset.create( display_name=display_name, bigquery_source=bigquery_source, ) dataset.wait() print(f'\tDataset: "{dataset.display_name}"') print(f'\tname: "{dataset.resource_name}"')Get operation status
Some requests start long-running operations that require time to complete. Theserequests return an operation name, which you can use to view the operation'sstatus or cancel the operation. Vertex AI provides helper methodsto make calls against long-running operations. For more information, seeWorking with long-runningoperations.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-11-24 UTC.