Batch import feature values Stay organized with collections Save and categorize content based on your preferences.
To learn more, run the "Example Feature Store workflow with sample data" notebook in one of the following environments:
Open in Colab |Open in Colab Enterprise |Openin Vertex AI Workbench |View on GitHub
Batch import lets you import feature values in bulk from a valid datasource. In a batch import request, you can import values for up to 100 features for an entity type. Note that you can have only one batch import job running per entity type to avoid collisions.
In a batch import request, specify the location of your source data and howit maps to features in your featurestore. Because each batch import requestis for a single entity type, your source data must also be for a single entitytype.
After the import has successfully completed, feature values are available tosubsequent read operations.
- For information about source data requirements, seeSource datarequirements.
- For information about how long Vertex AI Feature Store (Legacy) retains yourdata in the offline store, seeVertex AI Feature Store (Legacy) inQuotas and limits.
- For information about the oldest feature value timestamp that you can import, seeVertex AI Feature Store (Legacy) inQuotas and limits.
- You can't import feature values for which the timestamps indicate future dates or times.
Import job performance
Vertex AI Feature Store (Legacy) provides high throughput import, but theminimum latency can take a few minutes. Each request to Vertex AI Feature Store (Legacy)starts a job to complete the work. An import job takes a few minutes tocomplete even if you are importing a single record.
If you want to make adjustments to how a job performs, change the following two variables:
- The number of featurestore online serving nodes.
- The number of workers used for the import job. Workers process andwrite data into the featurestore.
The recommended number of workers is one worker for every10 online serving nodes on the featurestore. You can go higher if theonline serving load is low. You can specify a maximum of 100 workers. For moreguidance, seemonitor and tune resources accordingly to optimize batchimport.
If the online serving cluster is under-provisioned, theimport job might fail. In the event of a failure,retry the import request when the online serving load islow, or increase the node count of your featurestore and then retry therequest.
If the featurestore doesn't have an online store (zero online serving nodes), the import job writes only to the offline store, and the performance of the job depends solely on the number of import workers.
Data consistency
Inconsistencies can be introduced if the source data is modified duringimport. Ensure that any source data modifications are complete beforeyou start an import job. Also, duplicate feature values can result indifferent values being served between online and batch requests. Ensure thatyou have one feature value for each entity ID and timestamp pair.
If an import operation fails, the featurestore might only have partial data,which can lead to inconsistent values being returned between online and batchserving requests. To avoid this inconsistency, retry the same import requestagain and wait until the request successfully completes.
Null values and empty arrays
During import, Vertex AI Feature Store (Legacy) considers null scalar values or empty arrays as empty values. These include empty values in a CSV column. Vertex AI Feature Store (Legacy) doesn't support non-scalar null values, such as anull value in an array.
During online serving and batch serving, Vertex AI Feature Store (Legacy) returns the latest non-null or non-empty value of the feature. If a historical value of the feature isn't available, then Vertex AI Feature Store (Legacy) returnsnull.
NaN values
Vertex AI Feature Store (Legacy) supports NaN (Not a Number) values inDouble andDoubleArray. During import, you can enterNaN in the serving input CSV file to represent a NaN value. During online serving and batch serving, Vertex AI Feature Store (Legacy) returnsNaN for NaN values.
NaN is considered as valid data only if for the data typeDouble. For all other data types,NaN is considered as an invalid feature value. In such a scenario, Vertex AI Feature Store (Legacy) invalidates and ignores the entire row.Batch import
Import values in bulk into a featurestore for one or more features of a singleentity type.
Web UI
- In the Vertex AI section of the Google Cloud console, go to theFeatures page.
- Select a region from theRegion drop-down list.
- In the features table, view theEntity type column and find the entity type that contains the features that you want to import values for.
- Click the name of the entity type.
- From the action bar, clickIngest values.
- ForData source, select one of the following:
- Cloud Storage CSV file: Select this option to import data from multiple CSV files from Cloud Storage. Specify the path and name of the CSV file. To specify additional files, clickAdd another file.
- Cloud Storage AVRO file: Select this option to import data from an AVRO file from Cloud Storage. Specify the path and name of the AVRO file.
- BigQuery table: Select this option to import data from a BigQuery table or BigQuery view. Browse and select a table or view to use, which is in the following format:
PROJECT_ID.DATASET_ID.TABLE_ID
- ClickContinue.
- ForMap column to features, specify which columns in your source data map to entities and features in your featurestore.
- Specify the column name in your source data that contains the entity IDs.
- For the timestamp, specify a timestamp column in your source data or specify a single timestamp associated with all feature values that you import.
- In the list of features, enter the source data column name that maps to each feature. By default, Vertex AI Feature Store (Legacy) assumes that the feature name and column name match.
- ClickIngest.
REST
To import feature values for existing features, send a POST request by using thefeaturestores.entityTypes.importFeatureValuesmethod. Note that if the names of the source data columns and the destinationfeature IDs are different, include thesourceField parameter.
Before using any of the request data, make the following replacements:
- LOCATION_ID: Region where the featurestore is created. For example,
us-central1. - PROJECT_ID: Yourproject ID.
- FEATURESTORE_ID: ID of the featurestore.
- ENTITY_TYPE_ID: ID of the entity type.
- ENTITY_SOURCE_COLUMN_ID: ID of source column that contains entity IDs.
- FEATURE_TIME_ID: ID of source column that contains the feature timestamps for the feature values.
- FEATURE_ID: ID of an existing feature in the featurestore to import values for.
- FEATURE_SOURCE_COLUMN_ID: ID of source column that contain feature values for the entities.
- SOURCE_DATA_DETAILS: The source data location, which also indicates the format, such as
"bigquerySource": { "inputUri": "bq://test.dataset.sourcetable" }for a BigQuery table or BigQuery view. - WORKER_COUNT: The number of workers to use to write data to the featurestore.
HTTP method and URL:
POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:importFeatureValues
Request JSON body:
{ "entityIdField": "ENTITY_SOURCE_COLUMN_ID", "featureTimeField": "FEATURE_TIME_ID",SOURCE_DATA_DETAILS, "featureSpecs": [{ "id": "FEATURE_ID", "sourceField": "FEATURE_SOURCE_COLUMN_ID" }], "workerCount":WORKER_COUNT}To send your request, choose one of these options:
curl
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:importFeatureValues"
PowerShell
Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list. Save the request body in a file namedrequest.json, and execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID:importFeatureValues" | Select-Object -Expand Content
You should see output similar to the following. You can use theOPERATION_ID in the response toget the status of the operation.
{ "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/featurestores/FEATURESTORE_ID/entityTypes/ENTITY_TYPE_ID/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.ImportFeatureValuesOperationMetadata", "genericMetadata": { "createTime": "2021-03-02T00:04:13.039166Z", "updateTime": "2021-03-02T00:04:13.039166Z" } }}Python
To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.
importdatetimefromtypingimportList,Unionfromgoogle.cloudimportaiplatformdefimport_feature_values_sample(project:str,location:str,entity_type_id:str,featurestore_id:str,feature_ids:List[str],feature_time:Union[str,datetime.datetime],gcs_source_uris:Union[str,List[str]],gcs_source_type:str,):aiplatform.init(project=project,location=location)my_entity_type=aiplatform.featurestore.EntityType(entity_type_name=entity_type_id,featurestore_id=featurestore_id)my_entity_type.ingest_from_gcs(feature_ids=feature_ids,feature_time=feature_time,gcs_source_uris=gcs_source_uris,gcs_source_type=gcs_source_type,)Java
Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
importcom.google.api.gax.longrunning.OperationFuture;importcom.google.cloud.aiplatform.v1.AvroSource;importcom.google.cloud.aiplatform.v1.EntityTypeName;importcom.google.cloud.aiplatform.v1.FeaturestoreServiceClient;importcom.google.cloud.aiplatform.v1.FeaturestoreServiceSettings;importcom.google.cloud.aiplatform.v1.GcsSource;importcom.google.cloud.aiplatform.v1.ImportFeatureValuesOperationMetadata;importcom.google.cloud.aiplatform.v1.ImportFeatureValuesRequest;importcom.google.cloud.aiplatform.v1.ImportFeatureValuesRequest.FeatureSpec;importcom.google.cloud.aiplatform.v1.ImportFeatureValuesResponse;importjava.io.IOException;importjava.util.ArrayList;importjava.util.List;importjava.util.concurrent.ExecutionException;importjava.util.concurrent.TimeUnit;importjava.util.concurrent.TimeoutException;publicclassImportFeatureValuesSample{publicstaticvoidmain(String[]args)throwsIOException,InterruptedException,ExecutionException,TimeoutException{// TODO(developer): Replace these variables before running the sample.Stringproject="YOUR_PROJECT_ID";StringfeaturestoreId="YOUR_FEATURESTORE_ID";StringentityTypeId="YOUR_ENTITY_TYPE_ID";StringentityIdField="YOUR_ENTITY_FIELD_ID";StringfeatureTimeField="YOUR_FEATURE_TIME_FIELD";StringgcsSourceUri="YOUR_GCS_SOURCE_URI";intworkerCount=2;Stringlocation="us-central1";Stringendpoint="us-central1-aiplatform.googleapis.com:443";inttimeout=300;importFeatureValuesSample(project,featurestoreId,entityTypeId,gcsSourceUri,entityIdField,featureTimeField,workerCount,location,endpoint,timeout);}staticvoidimportFeatureValuesSample(Stringproject,StringfeaturestoreId,StringentityTypeId,StringgcsSourceUri,StringentityIdField,StringfeatureTimeField,intworkerCount,Stringlocation,Stringendpoint,inttimeout)throwsIOException,InterruptedException,ExecutionException,TimeoutException{FeaturestoreServiceSettingsfeaturestoreServiceSettings=FeaturestoreServiceSettings.newBuilder().setEndpoint(endpoint).build();// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(FeaturestoreServiceClientfeaturestoreServiceClient=FeaturestoreServiceClient.create(featurestoreServiceSettings)){List<FeatureSpec>featureSpecs=newArrayList<>();featureSpecs.add(FeatureSpec.newBuilder().setId("title").build());featureSpecs.add(FeatureSpec.newBuilder().setId("genres").build());featureSpecs.add(FeatureSpec.newBuilder().setId("average_rating").build());ImportFeatureValuesRequestimportFeatureValuesRequest=ImportFeatureValuesRequest.newBuilder().setEntityType(EntityTypeName.of(project,location,featurestoreId,entityTypeId).toString()).setEntityIdField(entityIdField).setFeatureTimeField(featureTimeField).addAllFeatureSpecs(featureSpecs).setWorkerCount(workerCount).setAvroSource(AvroSource.newBuilder().setGcsSource(GcsSource.newBuilder().addUris(gcsSourceUri))).build();OperationFuture<ImportFeatureValuesResponse,ImportFeatureValuesOperationMetadata>importFeatureValuesFuture=featurestoreServiceClient.importFeatureValuesAsync(importFeatureValuesRequest);System.out.format("Operation name: %s%n",importFeatureValuesFuture.getInitialFuture().get().getName());System.out.println("Waiting for operation to finish...");ImportFeatureValuesResponseimportFeatureValuesResponse=importFeatureValuesFuture.get(timeout,TimeUnit.SECONDS);System.out.println("Import Feature Values Response");System.out.println(importFeatureValuesResponse);featurestoreServiceClient.close();}}}Node.js
Before trying this sample, follow theNode.js setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AINode.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.
/** * TODO(developer): Uncomment these variables before running the sample.\ * (Not necessary if passing values as arguments) */// const project = 'YOUR_PROJECT_ID';// const featurestoreId = 'YOUR_FEATURESTORE_ID';// const entityTypeId = 'YOUR_ENTITY_TYPE_ID';// const avroGcsUri = 'AVRO_FILE_IN_THE_GCS_URI';// const entityIdField = 'ENTITY_ID_FIELD_IN_AVRO';// const featureTimeField = 'TIMESTAMP_FIELD_IN_AVRO';// const workerCount = <NO_OF_WORKERS_FOR_INGESTION_JOB>;// const location = 'YOUR_PROJECT_LOCATION';// const apiEndpoint = 'YOUR_API_ENDPOINT';// const timeout = <TIMEOUT_IN_MILLI_SECONDS>;// Imports the Google Cloud Featurestore Service Client libraryconst{FeaturestoreServiceClient}=require('@google-cloud/aiplatform').v1;// Specifies the location of the api endpointconstclientOptions={apiEndpoint:apiEndpoint,};// Instantiates a clientconstfeaturestoreServiceClient=newFeaturestoreServiceClient(clientOptions);asyncfunctionimportFeatureValues(){// Configure the entityType resourceconstentityType=`projects/${project}/locations/${location}/featurestores/${featurestoreId}/entityTypes/${entityTypeId}`;constavroSource={gcsSource:{uris:[avroGcsUri],},};constfeatureSpecs=[{id:'age'},{id:'gender'},{id:'liked_genres'}];constrequest={entityType:entityType,avroSource:avroSource,entityIdField:entityIdField,featureSpecs:featureSpecs,featureTimeField:featureTimeField,workerCount:Number(workerCount),};// Import Feature Values Requestconst[operation]=awaitfeaturestoreServiceClient.importFeatureValues(request,{timeout:Number(timeout)});const[response]=awaitoperation.promise();console.log('Import feature values response');console.log('Raw response:');console.log(JSON.stringify(response,null,2));}importFeatureValues();View import jobs
Use the Google Cloud console to view batch import jobs in aGoogle Cloud project.
Note: For featurestores created duringPreview, theGoogle Cloud console lists both batch import and batch serving jobs.Featurestores created on or after General Availability (October 5, 2021) showonly batch import jobs. If you see both job types and just want to view batchimport jobs, you mustcreate and use anew featurestore.Web UI
- In the Vertex AI section of the Google Cloud console, go to theFeatures page.
- Select a region from theRegion drop-down list.
- From the action bar, clickView ingestion jobs to list import jobs for all featurestores.
- Click the ID of an import job to view its details such as its data source, number of import entities, and number of feature values imported.
Overwrite existing data in a featurestore
You can re-import values to overwrite existing feature values if they both havethe same timestamps. You don't need to delete existing feature values first. Forexample, you might rely on an underlying source data that was recently changed.To keep your featurestore consistent with that underlying data, import yourfeature values again. If you have mismatched timestamps, the imported values areconsidered unique and the old values continue to exist (they aren'toverwritten).
To ensure consistency between online and batch serving requests, wait until theimport job is complete before making any serving requests.
Backfill historical data
If you're backfilling data, where you're importing past feature values, disableonline serving for your import job. Online serving is for serving thelatest feature values only, which backfilling doesn't include. Disabling onlineserving is useful because you eliminate any load on your online serving nodesand increase throughput for your import job, which can decrease itscompletion time.
You can disable online serving for import jobs when you use the API or clientlibraries. For more information, see thedisableOnlineServing field for theimportFeatureValuemethod.
What's next
- Learn how to serve features throughonlineserving orbatchserving.
- Learn how tomonitor imported feature values overtime.
- View the Vertex AI Feature Store (Legacy)concurrent batch jobquota.
- Troubleshoot commonVertex AI Feature Store (Legacy) issues.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.
Open in Colab
Open in Colab Enterprise
Openin Vertex AI Workbench
View on GitHub