The ML.ENTITY_FEATURES_AT_TIME function
This document describes theML.ENTITY_FEATURES_AT_TIME function, which letsyou use multiple point-in-time cutoffs for multiple entities when retrievingfeatures, because features can have time dependencies if they includetime-sensitive data. To avoiddata leakage, usepoint-in-time features when training models and running inference.
Use this function to retrieve features from multiple entities for multiplepoints in time. For example, you could retrieve features created at or beforethree different points in time for entity 1, and features created at or beforeyet another different point in time for entity 2. Use theML.FEATURES_AT_TIME functionto use the same point-in-time cutoff for all entities when retrieving features.
Syntax
ML.ENTITY_FEATURES_AT_TIME( { TABLE `PROJECT_ID.DATASET.FEATURE_TABLE_NAME` | (FEATURE_QUERY_STATEMENT) }, { TABLE `PROJECT_ID.DATASET.ENTITY_TIME_TABLE_NAME` | (ENTITY_TIME_QUERY_STATEMENT) }, [,NUM_ROWS => INT64][,IGNORE_FEATURE_NULLS => BOOL])Arguments
ML.ENTITY_FEATURES_AT_TIME takes the following arguments:
PROJECT_ID: the project that contains theresource.DATASET: the BigQuery dataset thatcontains the resource.FEATURE_TABLE_NAME: aSTRINGvalue that specifiesthe name of the BigQuery table that contains the feature data.The feature table must contain the following columns:entity_id: aSTRINGcolumn that contains the ID of the entityrelated to the features.- One or more feature columns.
feature_timestamp: aTIMESTAMPcolumn that identifies when thefeature data was last updated.
The column names are case-insensitive. For example, you can use a columnnamed
Entity_IDinstead ofentity_id.The feature table must be inwide format,with one column for each feature.
FEATURE_QUERY_STATEMENT: aSTRINGvalue thatspecifies a GoogleSQL query that returns the feature data. Thisquery must return the same columns asfeature_table. SeeGoogleSQL query syntaxfor the supported SQL syntax of theFEATURE_QUERY_STATEMENTclause.ENTITY_TIME_TABLE_NAME: aSTRINGvalue thatspecifies the name of the BigQuery table that maps entity IDsto feature lookup times. The entity time table must contain thefollowing columns:entity_id: aSTRINGcolumn that contains the entity ID.time: aTIMESTAMPcolumn that identifies a point in time to use as acutoff time when selecting features for the entity represented by theentity ID.
The column names are case-insensitive. For example, you can use a columnnamed
Entity_IDinstead ofentity_id.The table identified by the
ENTITY_TIME_TABLE_NAMEvalue must be nolarger than 100 MB.ENTITY_TIME_QUERY_STATEMENT: aSTRINGvalue thatspecifies a GoogleSQL query that returns the entity time data. Thisquery must return the same columns as theENTITY_TIME_TABLE_NAMEargument.SeeGoogleSQL query syntaxfor the supported SQL syntax of theENTITY_TIME_QUERY_STATEMENTclause.NUM_ROWS: anINT64value that specifies the numberof rows to return for each row inENTITY_TIME_TABLE_NAME. Defaults to1.IGNORE_FEATURE_NULLS: aBOOLvalue that indicateswhether to replace aNULLvalue in a feature column with the feature columnvalue from the row for the same entity that immediately precedes it in time.For example, for the following feature table and entity time table:Feature table
+-----------+------+------+--------------------------+| entity_id | f1 | f2 | feature_timestamp |+-----------+------+------+--------------------------+| '2' | 5.0 | 8.0 | '2022-06-10 09:00:00+00' |+-----------+------+------+--------------------------+| '2' | 2.0 | 4.0 | '2022-06-10 12:00:00+00' |+-----------+------+------+--------------------------+| '2' | 7.0 | NULL | '2022-06-11 10:00:00+00' |+-----------+------+------+--------------------------+
Entity time table
+-----------+--------------------------+| entity_id | time |+-----------+--------------------------+| '2' | '2022-06-11 10:00:00+00' |+-----------+--------------------------+
Running this query:
SELECT*FROMML.ENTITY_FEATURES_AT_TIME(TABLEmydataset.feature_table,TABLEmydataset.entity_time_table,num_rows=>1,ignore_feature_nulls=>TRUE);
Results in the following output, where the
f2value from therow for entity ID 2 that is timestamped'2022-06-10 12:00:00+00'issubstituted for theNULLvalue in the row timestamped'2022-06-11 10:00:00+00':+-----------+------+------+--------------------------+| entity_id | f1 | f2 | feature_timestamp |+-----------+------+------+--------------------------+| '2' | 7.0 | 4.0 | '2022-06-11 10:00:00+00' |+-----------+------+------+--------------------------+
If there is no available replacement value, for example, where there is noearlier row for that entity ID, a
NULLvalue is returned.Defaults to
FALSE.
Output
ML.ENTITY_FEATURES_AT_TIME returns the input table rows that meet thepoint-in-time cutoff criteria, with thefeature_timestamp column showing thetimestamp from thetime column of the entity time table.
Because you can specify multiple points in time from which to retrieve featuresfor the same entity, it is possible to return duplicate rows, depending on thetimestamps in the feature and entity time tables, and thenum_rows valueyou specify. For example, if the only row in the feature table for entity ID 1is timestamped2022-06-11 10:00:00+00, and you have two rows for entity ID 1in the entity time table that both have later timestamps, the function outputhas 2 rows with the same feature data for entity ID 1.
If either of the following conditions are true:
- No entity ids from the entity time table are found in the feature table.
- The rows in the feature table whose entity ids match those inthe entity time table don't meet the point-in-time criteria.
Then the function doesn't return any output for that entitytime table row.
Examples
Example 1
This example shows a how to retrain a model using only features that werecreated or updated before the timestamps identified inmydataset.entity_time_table:
CREATEORREPLACE`mydataset.mymodel`OPTIONS(WARM_START=TRUE)ASSELECT*EXCEPT(feature_timestamp,entity_id)FROMML.ENTITY_FEATURES_AT_TIME(TABLE`mydataset.feature_table`,TABLE`mydataset.entity_time_table`,num_rows=>1,ignore_feature_nulls=>TRUE);
Example 2
This example shows a how to get predictions from a model based on featuresthat were created or updated before the timestamps identified inmydataset.entity_time_table:
SELECT*FROMML.PREDICT(MODEL`mydataset.mymodel`,(SELECT*EXCEPT(feature_timestamp,entity_id)FROMML.ENTITY_FEATURES_AT_TIME(TABLE`mydataset.feature_table`,TABLE`mydataset.entity_time_table`,num_rows=>1,ignore_feature_nulls=>TRUE)));
Example 3
This is a contrived example that you can use to see the output of thefunction:
WITHfeature_tableAS(SELECT*FROMUNNEST(ARRAY<STRUCT<entity_idSTRING,f_1FLOAT64,f_2FLOAT64,feature_timestampTIMESTAMP>>[('id1',1.0,1.0,TIMESTAMP'2022-06-10 12:00:00+00'),('id2',12.0,24.0,TIMESTAMP'2022-06-11 12:00:00+00'),('id1',11.0,NULL,TIMESTAMP'2022-06-11 12:00:00+00'),('id1',6.0,12.0,TIMESTAMP'2022-06-11 10:00:00+00'),('id2',2.0,4.0,TIMESTAMP'2022-06-10 12:00:00+00'),('id2',7.0,NULL,TIMESTAMP'2022-06-11 10:00:00+00')])),entity_time_tableAS(SELECT*FROMUNNEST(ARRAY<STRUCT<entity_idSTRING,timeTIMESTAMP>>[('id1',TIMESTAMP'2022-06-12 12:00:00+00'),('id2',TIMESTAMP'2022-06-11 10:00:00+00'),('id1',TIMESTAMP'2022-06-10 13:00:00+00')]))SELECT*FROMML.ENTITY_FEATURES_AT_TIME(TABLEfeature_table,TABLEentity_time_table,num_rows=>1,ignore_feature_nulls=>TRUE);
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.