Source data requirements Stay organized with collections Save and categorize content based on your preferences.
For batch import, Vertex AI Feature Store (Legacy) can import data from tables inBigQuery or files in Cloud Storage.
Use BigQuery table if you need to import the entire dataset and don't require partition filters.
Use BigQuery view if you need to import a specific subset of the dataset. This option is more time-efficient and lets you import specific selections from the entire dataset, including multiple tables generated from the data.
Data contained in files imported from Cloud Storage must be in AVRO or CSV format.
For streaming import, you provide the feature values to import in the API request.These source data requirements don't apply. For more information, see thewriteFeatureValues APIreference.
Each item (or row) must adhere to the following requirements:
You must have a column for entity IDs, and the values must be of type
STRING. This column contains the entity IDs that the feature values arefor.Your source data value types must match the value types of the destinationfeature in the featurestore. For example, boolean values must be import intoa feature that is of type
BOOL.All columns must have a header that are of type
STRING. There are norestrictions on the name of the headers.- For BigQuery tables and BigQuery views, the column header is the column name.
- For AVRO, the column header is defined by the AVRO schema that is associatedwith the binary data.
- For CSV files, the column header is the first row.
If you provide a column for feature generation timestamps, use one of thefollowing timestamp formats:
- For BigQuery tables and BigQuery views, timestamps must be in the TIMESTAMP column.
- For Avro, timestamps must be of type long and logical type timestamp-micros.
- For CSV files, timestamps must be in the RFC 3339 format.
CSV files cannot include array data types. Use Avro or BigQueryinstead.
For array types, you cannot include a null value in the array. Though, you caninclude an empty array.
Feature value timestamps
For batch import, Vertex AI Feature Store (Legacy) requires user-providedtimestamps for the imported feature values. You can specify a particulartimestamp for each value or specify the same timestamp for all values:
- If the timestamps for feature values are different, specify the timestamps ina column in your source data. Each row must have its own timestamp indicatingwhen the feature value was generated. In your import request, you specifythe column name to identify the timestamp column.
- If the timestamp for all feature values is the same, you can specify it as aparameter in your import request. You can also specify the timestamp in acolumn in your source data, where each row has the same timestamp.
Data source region
If your source data is in either BigQuery or Cloud Storage, thesource dataset or bucket must be in the same region or in the samemulti-regional location as your featurestore. For example, a featurestore inus-central1 can import data only from Cloud Storage buckets orBigQuery datasets that are inus-central1 or in the US multi-regionlocation. You can't import data from, for example,us-east1. Also, sourcedata from dual-region buckets is not supported.
What's next
- Learn aboutsetting up your project to useVertex AI Feature Store (Legacy).
- Learn how tobatch import feature values.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.