The ML.ENTITY_FEATURES_AT_TIME function

This document describes theML.ENTITY_FEATURES_AT_TIME function, which letsyou use multiple point-in-time cutoffs for multiple entities when retrievingfeatures, because features can have time dependencies if they includetime-sensitive data. To avoiddata leakage, usepoint-in-time features when training models and running inference.

Use this function to retrieve features from multiple entities for multiplepoints in time. For example, you could retrieve features created at or beforethree different points in time for entity 1, and features created at or beforeyet another different point in time for entity 2. Use theML.FEATURES_AT_TIME functionto use the same point-in-time cutoff for all entities when retrieving features.

Syntax

ML.ENTITY_FEATURES_AT_TIME(   { TABLE `PROJECT_ID.DATASET.FEATURE_TABLE_NAME` | (FEATURE_QUERY_STATEMENT) },   { TABLE `PROJECT_ID.DATASET.ENTITY_TIME_TABLE_NAME` | (ENTITY_TIME_QUERY_STATEMENT) },   [,NUM_ROWS => INT64][,IGNORE_FEATURE_NULLS => BOOL])

Arguments

ML.ENTITY_FEATURES_AT_TIME takes the following arguments:

  • PROJECT_ID: the project that contains theresource.
  • DATASET: the BigQuery dataset thatcontains the resource.
  • FEATURE_TABLE_NAME: aSTRING value that specifiesthe name of the BigQuery table that contains the feature data.The feature table must contain the following columns:

    • entity_id: aSTRING column that contains the ID of the entityrelated to the features.
    • One or more feature columns.
    • feature_timestamp: aTIMESTAMP column that identifies when thefeature data was last updated.

    The column names are case-insensitive. For example, you can use a columnnamedEntity_ID instead ofentity_id.

    The feature table must be inwide format,with one column for each feature.

  • FEATURE_QUERY_STATEMENT: aSTRING value thatspecifies a GoogleSQL query that returns the feature data. Thisquery must return the same columns asfeature_table. SeeGoogleSQL query syntaxfor the supported SQL syntax of theFEATURE_QUERY_STATEMENT clause.

  • ENTITY_TIME_TABLE_NAME: aSTRING value thatspecifies the name of the BigQuery table that maps entity IDsto feature lookup times. The entity time table must contain thefollowing columns:

    • entity_id: aSTRING column that contains the entity ID.
    • time: aTIMESTAMP column that identifies a point in time to use as acutoff time when selecting features for the entity represented by theentity ID.

    The column names are case-insensitive. For example, you can use a columnnamedEntity_ID instead ofentity_id.

    The table identified by theENTITY_TIME_TABLE_NAME value must be nolarger than 100 MB.

  • ENTITY_TIME_QUERY_STATEMENT: aSTRING value thatspecifies a GoogleSQL query that returns the entity time data. Thisquery must return the same columns as theENTITY_TIME_TABLE_NAME argument.SeeGoogleSQL query syntaxfor the supported SQL syntax of theENTITY_TIME_QUERY_STATEMENT clause.

  • NUM_ROWS: anINT64 value that specifies the numberof rows to return for each row inENTITY_TIME_TABLE_NAME. Defaults to1.

  • IGNORE_FEATURE_NULLS: aBOOL value that indicateswhether to replace aNULL value in a feature column with the feature columnvalue from the row for the same entity that immediately precedes it in time.For example, for the following feature table and entity time table:

    Feature table

    +-----------+------+------+--------------------------+| entity_id | f1   | f2   | feature_timestamp        |+-----------+------+------+--------------------------+| '2'       | 5.0  | 8.0  | '2022-06-10 09:00:00+00' |+-----------+------+------+--------------------------+| '2'       | 2.0  | 4.0  | '2022-06-10 12:00:00+00' |+-----------+------+------+--------------------------+| '2'       | 7.0  | NULL | '2022-06-11 10:00:00+00' |+-----------+------+------+--------------------------+

    Entity time table

    +-----------+--------------------------+| entity_id | time                     |+-----------+--------------------------+| '2'       | '2022-06-11 10:00:00+00' |+-----------+--------------------------+

    Running this query:

    SELECT*FROMML.ENTITY_FEATURES_AT_TIME(TABLEmydataset.feature_table,TABLEmydataset.entity_time_table,num_rows=>1,ignore_feature_nulls=>TRUE);

    Results in the following output, where thef2 value from therow for entity ID 2 that is timestamped'2022-06-10 12:00:00+00' issubstituted for theNULL value in the row timestamped'2022-06-11 10:00:00+00':

    +-----------+------+------+--------------------------+| entity_id | f1   | f2   | feature_timestamp        |+-----------+------+------+--------------------------+| '2'       | 7.0  | 4.0  | '2022-06-11 10:00:00+00' |+-----------+------+------+--------------------------+

    If there is no available replacement value, for example, where there is noearlier row for that entity ID, aNULL value is returned.

    Defaults toFALSE.

Output

ML.ENTITY_FEATURES_AT_TIME returns the input table rows that meet thepoint-in-time cutoff criteria, with thefeature_timestamp column showing thetimestamp from thetime column of the entity time table.

Because you can specify multiple points in time from which to retrieve featuresfor the same entity, it is possible to return duplicate rows, depending on thetimestamps in the feature and entity time tables, and thenum_rows valueyou specify. For example, if the only row in the feature table for entity ID 1is timestamped2022-06-11 10:00:00+00, and you have two rows for entity ID 1in the entity time table that both have later timestamps, the function outputhas 2 rows with the same feature data for entity ID 1.

If either of the following conditions are true:

  • No entity ids from the entity time table are found in the feature table.
  • The rows in the feature table whose entity ids match those inthe entity time table don't meet the point-in-time criteria.

Then the function doesn't return any output for that entitytime table row.

Examples

Example 1

This example shows a how to retrain a model using only features that werecreated or updated before the timestamps identified inmydataset.entity_time_table:

CREATEORREPLACE`mydataset.mymodel`OPTIONS(WARM_START=TRUE)ASSELECT*EXCEPT(feature_timestamp,entity_id)FROMML.ENTITY_FEATURES_AT_TIME(TABLE`mydataset.feature_table`,TABLE`mydataset.entity_time_table`,num_rows=>1,ignore_feature_nulls=>TRUE);

Example 2

This example shows a how to get predictions from a model based on featuresthat were created or updated before the timestamps identified inmydataset.entity_time_table:

SELECT*FROMML.PREDICT(MODEL`mydataset.mymodel`,(SELECT*EXCEPT(feature_timestamp,entity_id)FROMML.ENTITY_FEATURES_AT_TIME(TABLE`mydataset.feature_table`,TABLE`mydataset.entity_time_table`,num_rows=>1,ignore_feature_nulls=>TRUE)));

Example 3

This is a contrived example that you can use to see the output of thefunction:

WITHfeature_tableAS(SELECT*FROMUNNEST(ARRAY<STRUCT<entity_idSTRING,f_1FLOAT64,f_2FLOAT64,feature_timestampTIMESTAMP>>[('id1',1.0,1.0,TIMESTAMP'2022-06-10 12:00:00+00'),('id2',12.0,24.0,TIMESTAMP'2022-06-11 12:00:00+00'),('id1',11.0,NULL,TIMESTAMP'2022-06-11 12:00:00+00'),('id1',6.0,12.0,TIMESTAMP'2022-06-11 10:00:00+00'),('id2',2.0,4.0,TIMESTAMP'2022-06-10 12:00:00+00'),('id2',7.0,NULL,TIMESTAMP'2022-06-11 10:00:00+00')])),entity_time_tableAS(SELECT*FROMUNNEST(ARRAY<STRUCT<entity_idSTRING,timeTIMESTAMP>>[('id1',TIMESTAMP'2022-06-12 12:00:00+00'),('id2',TIMESTAMP'2022-06-11 10:00:00+00'),('id1',TIMESTAMP'2022-06-10 13:00:00+00')]))SELECT*FROMML.ENTITY_FEATURES_AT_TIME(TABLEfeature_table,TABLEentity_time_table,num_rows=>1,ignore_feature_nulls=>TRUE);

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.