Derived tables in Looker

In Looker, aderived table is a query whose results are used as if the query were an actual table in the database.

For example, you might have a database table calledorders that has many columns. You want to compute some customer-level aggregate metrics, such as how many orders each customer has placed or when each customer placed their first order. Using either anative derived table or aSQL-based derived table, you can create a new database table namedcustomer_order_summary that includes these metrics.

You can then work with thecustomer_order_summary derived table as if it were any other table in the database.

For popular use cases of derived tables, visitLooker cookbooks: Getting the most out of derived tables in Looker.

Native derived tables and SQL-based derived tables

To create a derived table in your Looker project, use thederived_table parameter under aview parameter. Inside thederived_table parameter, you can define the query for the derived table in one of two ways:

For example, the following view files show how you could use LookML to create a view from acustomer_order_summary derived table. The two versions of the LookML illustrate how you can create equivalent derived tables using either LookML or SQL to define the query for the derived table:

  • The native derived table defines the query with LookML in theexplore_source parameter. In this example, the query is based on an existingorders view, which is defined in a separate file that is not shown in this example. Theexplore_source query in the native derived table brings in thecustomer_id,first_order, andtotal_amount fields from theorders view file.
  • The SQL-based derived table defines the query using SQL in thesql parameter. In this example, the SQL query is a direct query of theorders table in the database.
Native derived table version
view: customer_order_summary {  derived_table: {    explore_source: orders {      column: customer_id {        field: orders.customer_id      }      column: first_order {        field: orders.first_order      }      column: total_amount {        field: orders.total_amount      }    }  }  dimension: customer_id {    type: number    primary_key: yes    sql: ${TABLE}.customer_id ;;  }  dimension_group: first_order {    type: time    timeframes: [date, week, month]    sql: ${TABLE}.first_order ;;  }  dimension: total_amount {    type: number    value_format: "0.00"    sql: ${TABLE}.total_amount ;;  }}
SQL-based derived table version
view: customer_order_summary {  derived_table: {    sql:      SELECT        customer_id,        MIN(DATE(time)) AS first_order,        SUM(amount) AS total_amount      FROM        orders      GROUP BY        customer_id ;;  }  dimension: customer_id {    type: number    primary_key: yes    sql: ${TABLE}.customer_id ;;  }  dimension_group: first_order {    type: time    timeframes: [date, week, month]    sql: ${TABLE}.first_order ;;  }  dimension: total_amount {    type: number    value_format: "0.00"    sql: ${TABLE}.total_amount ;;  }}

Both versions create a view calledcustomer_order_summary that is based on theorders table, with the columnscustomer_id,first_order, andtotal_amount.

Other than thederived_table parameter and its subparameters, thiscustomer_order_summary view works just like any otherview file. Whether you define the derived table's query with LookML or with SQL, you can create LookML measures and dimensions that are based on the columns of the derived table.

Once you define your derived table, you can use it like any other table in your database.

Native derived tables

Native derived tables are based on queries that you define using LookML terms. To create a native derived table, you use theexplore_source parameter inside thederived_table parameter of aview parameter. You create the columns of your native derived table by referring to the LookML dimensions or measures in your model. See the native derived table view file in theprevious example.

Compared to SQL-based derived tables, native derived tables are much easier to read and understand as you model your data.

See theCreating native derived tables documentation page for details on creating native derived tables.

SQL-based derived tables

To create a SQL-based derived table, you define a query in SQL terms, creating columns in the table using a SQL query. You cannot refer to LookML dimensions and measures in a SQL-based derived table. See the SQL-based derived table view file in theprevious example.

Most commonly, you define the SQL query using thesql parameter inside thederived_table parameter of aview parameter.

A helpful shortcut for creating SQL-based queries in Looker is touse SQL Runner to create the SQL query and turn it into a derived table definition.

Certain edge cases won't permit the use of thesql parameter. In such cases, Looker supports the following parameters for defining a SQL query forpersistent derived tables (PDTs):

  • create_process: When you use thesql parameter for a PDT, in the background Looker wraps the dialect'sCREATE TABLEData Definition Language (DDL) statement around your query to create the PDT from your SQL query. Some dialects don't support a SQLCREATE TABLE statement in a single step. For these dialects, you cannot create a PDT with thesql parameter. Instead, you can use thecreate_process parameter to create a PDT in multiple steps. See thecreate_process parameter documentation page for information and examples.
  • sql_create: If your use case requires customDDL commands and your dialect supports DDL (for example, the Google predictiveBigQuery ML), you can use thesql_create parameter to create a PDT instead of using thesql parameter. See thesql_create documentation page for information and examples.

Whether you are using thesql,create_process, orsql_create parameter, in all these cases you are defining the derived table with a SQL query, so these are all considered SQL-based derived tables.

When you define a SQL-based derived table, make sure to give each column a clean alias by usingAS. This is because you will need to reference the column names of your result set in your dimensions, such as${TABLE}.first_order. This is why theprevious example usesMIN(DATE(time)) AS first_order instead of justMIN(DATE(time)).

Note: To be used as anincremental PDT, a SQL-based derived table must be defined using thesql parameter. SQL-based derived tables that are defined with thesql_create parameter or thecreate_process parameter cannot be incrementally built.

Temporary and persistent derived tables

In addition to the distinction between native derived tables and SQL-based derived tables, there is also a distinction between atemporary derived table — which is not written to the database — and apersistent derived table (PDT) — which is written to a schema on your database.

Native derived tables and SQL-based derived tables can be either temporary or persistent.

Temporary derived tables

Thederived tables shown previously are examples oftemporary derived tables. They are temporary because there is nopersistence strategy defined in thederived_table parameter.

Temporary derived tables are not written to the database. When a user runs an Explore query that involves one or more derived tables, Looker constructs a SQL query by using a dialect-specific combination of the SQL for the derived table(s) plus the requested fields, joins, and filter values. If the combination has been run before and the results are still valid in the cache, Looker uses the cached results. See theCaching queries documentation page for more information on query caching in Looker.

Otherwise, if Looker can't use cached results, Looker must run a new query on your database every time a user requests data from a temporary derived table. Because of this, you should be sure that your temporary derived tables are performant and won't put excessive strain on your database. In cases where the query will take some time to run, aPDT is often a better option.

Supported database dialects for temporary derived tables

For Looker to support derived tables in your Looker project, your database dialect must also support them. The following table shows which dialects support derived tables in the latest release of Looker:

Click here to display the table.

DialectSupported?
Actian Avalanche
Amazon Athena
Amazon Aurora MySQL
Amazon Redshift
Amazon Redshift 2.1+
Amazon Redshift Serverless 2.1+
Apache Druid
Apache Druid 0.13+
Apache Druid 0.18+
Apache Hive 2.3+
Apache Hive 3.1.2+
Apache Spark 3+
ClickHouse
Cloudera Impala 3.1+
Cloudera Impala 3.1+ with Native Driver
Cloudera Impala with Native Driver
DataVirtuality
Databricks
Denodo 7
Denodo 8 & 9
Dremio
Dremio 11+
Exasol
Google BigQuery Legacy SQL
Google BigQuery Standard SQL
Google Cloud AlloyDB for PostgreSQL
Google Cloud PostgreSQL
Google Cloud SQL
Google Spanner
Greenplum
HyperSQL
IBM Netezza
MariaDB
Microsoft Azure PostgreSQL
Microsoft Azure SQL Database
Microsoft Azure Synapse Analytics
Microsoft SQL Server 2008+
Microsoft SQL Server 2012+
Microsoft SQL Server 2016
Microsoft SQL Server 2017+
MongoBI
MySQL
MySQL 8.0.12+
Oracle
Oracle ADWC
PostgreSQL 9.5+
PostgreSQL pre-9.5
PrestoDB
PrestoSQL
SAP HANA
SAP HANA 2+
SingleStore
SingleStore 7+
Snowflake
Teradata
Trino
Vector
Vertica

Persistent derived tables

A persistent derived table (PDT) is a derived table that is written into a scratch schema on your database and regenerated on the schedule that you specify with apersistence strategy.

A PDT can be either anative derived table or aSQL-based derived table.

Requirements for PDTs

To use persistent derived tables (PDTs) in your Looker project, you need the following:

Supported database dialects for PDTs

For Looker to support PDTs in your Looker project, your database dialect must also support them.

To support any type of PDTs (either LookML-based or SQL-based), the dialect must support writes to the database,among other requirements. There are some read-only database configurations that don't allow persistence to work (most commonly Postgres hot-swap replica databases). In these cases, you can usetemporary derived tables instead.

The following table shows the dialects that support persistentSQL-based derived tables in the latest release of Looker:

Click here to display the table.

DialectSupported?
Actian Avalanche
Amazon Athena
Amazon Aurora MySQL
Amazon Redshift
Amazon Redshift 2.1+
Amazon Redshift Serverless 2.1+
Apache Druid
Apache Druid 0.13+
Apache Druid 0.18+
Apache Hive 2.3+
Apache Hive 3.1.2+
Apache Spark 3+
ClickHouse
Cloudera Impala 3.1+
Cloudera Impala 3.1+ with Native Driver
Cloudera Impala with Native Driver
DataVirtuality
Databricks
Denodo 7
Denodo 8 & 9
Dremio
Dremio 11+
Exasol
Google BigQuery Legacy SQL
Google BigQuery Standard SQL
Google Cloud AlloyDB for PostgreSQL
Google Cloud PostgreSQL
Google Cloud SQL
Google Spanner
Greenplum
HyperSQL
IBM Netezza
MariaDB
Microsoft Azure PostgreSQL
Microsoft Azure SQL Database
Microsoft Azure Synapse Analytics
Microsoft SQL Server 2008+
Microsoft SQL Server 2012+
Microsoft SQL Server 2016
Microsoft SQL Server 2017+
MongoBI
MySQL
MySQL 8.0.12+
Oracle
Oracle ADWC
PostgreSQL 9.5+
PostgreSQL pre-9.5
PrestoDB
PrestoSQL
SAP HANA
SAP HANA 2+
SingleStore
SingleStore 7+
Snowflake
Teradata
Trino
Vector
Vertica

To support persistent native derived tables (which have LookML-based queries), the dialect must also support aCREATE TABLE DDL function. Here is a list of the dialects that support persistentnative (LookML-based) derived tables in the latest release of Looker:

Click here to display the table.

DialectSupported?
Actian Avalanche
Amazon Athena
Amazon Aurora MySQL
Amazon Redshift
Amazon Redshift 2.1+
Amazon Redshift Serverless 2.1+
Apache Druid
Apache Druid 0.13+
Apache Druid 0.18+
Apache Hive 2.3+
Apache Hive 3.1.2+
Apache Spark 3+
ClickHouse
Cloudera Impala 3.1+
Cloudera Impala 3.1+ with Native Driver
Cloudera Impala with Native Driver
DataVirtuality
Databricks
Denodo 7
Denodo 8 & 9
Dremio
Dremio 11+
Exasol
Google BigQuery Legacy SQL
Google BigQuery Standard SQL
Google Cloud AlloyDB for PostgreSQL
Google Cloud PostgreSQL
Google Cloud SQL
Google Spanner
Greenplum
HyperSQL
IBM Netezza
MariaDB
Microsoft Azure PostgreSQL
Microsoft Azure SQL Database
Microsoft Azure Synapse Analytics
Microsoft SQL Server 2008+
Microsoft SQL Server 2012+
Microsoft SQL Server 2016
Microsoft SQL Server 2017+
MongoBI
MySQL
MySQL 8.0.12+
Oracle
Oracle ADWC
PostgreSQL 9.5+
PostgreSQL pre-9.5
PrestoDB
PrestoSQL
SAP HANA
SAP HANA 2+
SingleStore
SingleStore 7+
Snowflake
Teradata
Trino
Vector
Vertica
Note: PDTs are not supported forSnowflake connections that use OAuth.

Incrementally building PDTs

Anincremental PDT is apersistent derived table that Looker builds by appending fresh data to the table instead of rebuilding it in its entirety.

If yourdialect supports incremental PDTs, and your PDT uses a trigger-based persistence strategy (datagroup_trigger,sql_trigger_value, orinterval_trigger), you candefine the PDT as an incremental PDT.

Note: Incremental PDTs are not supported for PDTs that use thepersist_for persistence strategy.

See theIncremental PDTs documentation page for more information.

Supported database dialects for incremental PDTs

For Looker to support incremental PDTs in your Looker project, your database dialect must also support them. The following table shows which dialects support incremental PDTs in the latest release of Looker:

Click here to display the table.

DialectSupported?
Actian Avalanche
Amazon Athena
Amazon Aurora MySQL
Amazon Redshift
Amazon Redshift 2.1+
Amazon Redshift Serverless 2.1+
Apache Druid
Apache Druid 0.13+
Apache Druid 0.18+
Apache Hive 2.3+
Apache Hive 3.1.2+
Apache Spark 3+
ClickHouse
Cloudera Impala 3.1+
Cloudera Impala 3.1+ with Native Driver
Cloudera Impala with Native Driver
DataVirtuality
Databricks
Denodo 7
Denodo 8 & 9
Dremio
Dremio 11+
Exasol
Google BigQuery Legacy SQL
Google BigQuery Standard SQL
Google Cloud AlloyDB for PostgreSQL
Google Cloud PostgreSQL
Google Cloud SQL
Google Spanner
Greenplum
HyperSQL
IBM Netezza
MariaDB
Microsoft Azure PostgreSQL
Microsoft Azure SQL Database
Microsoft Azure Synapse Analytics
Microsoft SQL Server 2008+
Microsoft SQL Server 2012+
Microsoft SQL Server 2016
Microsoft SQL Server 2017+
MongoBI
MySQL
MySQL 8.0.12+
Oracle
Oracle ADWC
PostgreSQL 9.5+
PostgreSQL pre-9.5
PrestoDB
PrestoSQL
SAP HANA
SAP HANA 2+
SingleStore
SingleStore 7+
Snowflake
Teradata
Trino
Vector
Vertica

Creating PDTs

To make a derived table into a persistent derived table (PDT), you define apersistence strategy for the table. To optimize performance, you should also add anoptimization strategy.

Persistence strategies

The persistence of a derived table can be managed by Looker or, fordialects that support materialized views, by your database usingmaterialized views.

To make a derived table persistent, add one of the following parameters to thederived_table definition:

With trigger-based persistence strategies (datagroup_trigger,sql_trigger_value, andinterval_trigger), Looker maintains the PDT in the database until the PDT is triggered for rebuild. When the PDT is triggered, Looker rebuilds the PDT to replace the previous version. This means that, with trigger-based PDTs, your users won't have to wait for the PDT to be built in order to get answers for Explore queries from the PDT.

Note:Certain users can override the persistence setting of a PDT. A user withdevelop permission can use theRebuild Derived Tables & Run option from an Explore's menu to override the persistence setting and rebuild all the PDTs required for the current query in the Explore. See theManually rebuilding PDTs for a query section on this page for details.

datagroup_trigger

Datagroups are the most flexible method of creating persistence. If you have defined adatagroup withsql_trigger orinterval_trigger, you can use thedatagroup_trigger parameter to initiate the rebuilding of your persistent derived tables (PDTs).

Looker maintains the PDT in the database until its datagroup is triggered. When the datagroup is triggered, Looker rebuilds the PDT to replace the previous version. This means that, in most cases, your users won't have to wait for the PDT to be built. If a user requests data from the PDT while it is being built and the query results aren't in the cache, Looker will return data from the existing PDT until the new PDT is built. SeeCaching queries for an overview of datagroups.

See the section onThe Looker regenerator for more information on how the regenerator builds PDTs.

sql_trigger_value

Thesql_trigger_value parameter triggers the regeneration of a persistent derived table (PDT) that is based on a SQL statement that you provide. If the result of the SQL statement is different from the previous value, the PDT is regenerated. Otherwise, the existing PDT is maintained in the database. This means that, in most cases, your users won't have to wait for the PDT to be built. If a user requests data from the PDT while it is being built, and the query results aren't in the cache, Looker will return data from the existing PDT until the new PDT is built.

See the section onThe Looker regenerator for more information on how the regenerator builds PDTs.

interval_trigger

Theinterval_trigger parameter triggers the regeneration of a persistent derived table (PDT) based on a time interval that you provide, such as"24 hours" or"60 minutes". Similar to thesql_trigger parameter, this means that usually the PDT will be prebuilt when your users query it. If a user requests data from the PDT while it is being built, and the query results aren't in the cache, Looker will return data from the existing PDT until the new PDT is built.

persist_for

Yet another option is to use thepersist_for parameter to set the length of time the derived table should be stored before it is marked as expired, so that it is no longer used for queries and will be dropped from the database.

Apersist_for persistent derived table (PDT) is built when a user first runs a query on it. Looker then maintains the PDT in the database for the length of time that is specified in the PDT'spersist_for parameter. If a user queries the PDT within thepersist_for time, Looker uses cached results if possible or else runs the query on the PDT.

After thepersist_for time, Looker clears the PDT from your database, and the PDT will be rebuilt the next time a user queries it, which means that the query will need to wait for the rebuild.

PDTs that usepersist_for aren't automatically rebuilt by the Lookerregenerator, except in the case of a dependencycascade of PDTs. When apersist_for table is part of a dependency cascade with trigger-based PDTs (PDTs that use thedatagroup_trigger,interval_trigger, orsql_trigger_value persistence strategy), the regenerator will monitor and rebuild thepersist_for table in order to rebuild other tables in the cascade. See theHow Looker builds cascading derived tables section on this page.

materialized_view: yes

Note: The materialized view functionality is an advanced feature. Depending on your dialect, a materialized view can consume large resources, so it is important that you understand your dialect's implementation of materialized views. See your dialect's documentation for information on the behavior of your dialect and the frequency with which the dialect refreshes data for materialized views.

Materialized views allow you to use your database's functionality to persist derived tables in your Looker project. If your database dialectsupports materialized views and yourLooker connection is configured with theEnable PDTs toggle turned on, you can create a materialized view by specifyingmaterialized_view: yes for a derived table. Materialized views are supported for bothnative derived tables andSQL-based derived tables.

Similar to apersistent derived table (PDT), a materialized view is a query result that is stored as a table in the scratch schema of your database. The key difference between a PDT and a materialized view is in how tables are refreshed:

  • For PDTs, the persistence strategy is defined in Looker, and the persistence is managed by Looker.
  • For materialized views, the database is responsible for maintaining and refreshing the data in the table.

For this reason, the materialized view functionality requires advanced knowledge of your dialect and its features. In most cases, your database will refresh the materialized view any time the database detects new data in the tables that are queried by the materialized view. Materialized views are optimal for scenarios that require real-time data.

See thematerialized_view parameter documentation page for information on dialect support, requirements, and important considerations.

Optimization strategies

Because persistent derived tables (PDTs) are stored in your database, you should optimize your PDTs using the following strategies, as supported by your dialect:

For example, to add persistence to thederived table example, you could set it to rebuild when the datagrouporders_datagroup triggers, and add indexes on bothcustomer_id andfirst_order, like this:

view: customer_order_summary {  derived_table: {    explore_source: orders {      ...    }    datagroup_trigger: orders_datagroup    indexes: ["customer_id", "first_order"]  }}

If you don't add an index (or an equivalent for your dialect), Looker will warn you that you should do so to improve query performance.

Use cases for PDTs

Persistent derived tables (PDTs) are useful because they can improve performance of a query by persisting the results of the query in a table.

As a general best practice, developers should try to model data without using PDTs until absolutely necessary.

In some cases data can be optimized through other means. For example, adding an index or changing a column's data type might resolve an issue without the need to create a PDT. Make sure to analyze the execution plans of slow queries using theExplain from SQL Runner tool.

In addition to reducing query time and database load on frequently run queries, there are several other use cases for PDTs including:

You can alsouse a PDT to define a primary key in cases where there is no reasonable way to identify a unique row in a table as a primary key.

Using PDTs to test optimizations

You can use PDTs to test different indexing, distributions, and other optimization options without needing a large amount of support from your DBA or ETL developers.

Consider a case where you have a table but want to test different indexes. Your initial LookML for the view may look like the following:

view: customer {  sql_table_name: warehouse.customer ;;}

To test optimization strategies, you can use theindexes parameter to add indexes to the LookML like this:

view: customer {  # sql_table_name: warehouse.customer  derived_table: {    sql: SELECT * FROM warehouse.customer ;;    persist_for: "8 hours"    indexes: [customer_id, customer_name, salesperson_id]  }}

Query the view once to generate the PDT. Then run your test queries and compare your results. If your results are favorable, you can ask your DBA or ETL team to add the indexes to the original table.

Remember to change your view code back to remove the PDT.

Using PDTs to pre-join or aggregate data

It can be useful to pre-join or pre-aggregate data to adjust query optimization for high volumes or multiple types of data.

For example, suppose you want to create a query for customers by cohort based on when they made their first order. This query might be expensive to run multiple times whenever the data is needed in real time; however, you can calculate the query only once and then reuse the results with a PDT:

view: customer_order_facts {  derived_table: {    sql: SELECT    c.customer_id,    MIN(o.order_date) OVER (PARTITION BY c.customer_id) AS first_order_date,    MAX(o.order_date) OVER (PARTITION BY c.customer_id) AS most_recent_order_date,    COUNT(o.order_id) OVER (PARTITION BY c.customer_id) AS lifetime_orders,    SUM(o.order_value) OVER (PARTITION BY c.customer_id) AS lifetime_value,    RANK() OVER (PARTITION BY c.customer_id ORDER BY o.order_date ASC) AS order_sequence,    o.order_id    FROM warehouse.customer c LEFT JOIN warehouse.order o ON c.customer_id = o.customer_id    ;;    sql_trigger_value: SELECT CURRENT_DATE ;;    indexes: [customer_id, order_id, order_sequence, first_order_date]  }}

Cascading derived tables

It is possible to reference one derived table in the definition of another, creating a chain ofcascading derived tables, orcascading persistent derived tables (PDTs), as the case may be. An example of cascading derived tables would be a table,TABLE_D, which depends on another table,TABLE_C, whileTABLE_C depends onTABLE_B, andTABLE_B depends onTABLE_A.

Note: Before you set up cascading derived tables, it is important to know how cascading tables areautomatically rebuilt by the Looker regenerator and how they can bemanually rebuilt by users. There are also otherimportant considerations for implementing persisted tables. All these topics are covered later on this page.

Syntax for referencing a derived table

To reference a derived table in another derived table, use this syntax:

`${derived_table_or_view_name.SQL_TABLE_NAME}`

In this format,SQL_TABLE_NAME is a literal string. For example, you can reference theclean_events derived table with this syntax:

`${clean_events.SQL_TABLE_NAME}`

You can use this same syntax to refer to a LookML view. Again, in this case, theSQL_TABLE_NAME is a literal string.

Note: The${derived_table_or_view_name.SQL_TABLE_NAME} syntax is not supported in thesql_trigger parameter of adatagroup, so you can't use a derived table to trigger a datagroup. But you can use the${derived_table_or_view_name.SQL_TABLE_NAME} syntax in thesql_trigger_value parameter to trigger your PDT rebuilds.

In the next example, theclean_events PDT is created from theevents table in the database. Theclean_events PDT leaves out unwanted rows from theevents database table. Then a second PDT is shown; theevent_summary PDT is a summary of theclean_events PDT. Theevent_summary table regenerates whenever new rows are added toclean_events.

Theevent_summary PDT and theclean_events PDT are cascading PDTs, whereevent_summary is dependent onclean_events (sinceevent_summary is defined using theclean_events PDT). This particular example could be done more efficiently in a single PDT, but it's useful for demonstrating derived table references.

view: clean_events {  derived_table: {    sql:      SELECT *      FROM events      WHERE type NOT IN ('test', 'staff') ;;    datagroup_trigger: events_datagroup  }}view: events_summary {  derived_table: {    sql:      SELECT        type,        date,        COUNT(*) AS num_events      FROM        ${clean_events.SQL_TABLE_NAME} AS clean_events      GROUP BY        type,        date ;;    datagroup_trigger: events_datagroup  }}

Although it's not always required, when you're referring to a derived table in this manner, it's often useful to create an alias for the table by using this format:

${derived_table_or_view_name.SQL_TABLE_NAME} AS derived_table_or_view_name

The previous example does this:

${clean_events.SQL_TABLE_NAME} AS clean_events

It is helpful to use an alias because, behind the scenes, PDTs are named with lengthy codes in your database. In some cases (especially withON clauses) it's possible to forget that you need to use the${derived_table_or_view_name.SQL_TABLE_NAME} syntax to retrieve this lengthy name. An alias can help to prevent this type of mistake.

How Looker builds cascading derived tables

In the case of cascadingtemporary derived tables, if a user's query results aren't in the cache, Looker will build all the derived tables that are needed for the query. If you have aTABLE_D whosedefinition contains a reference toTABLE_C, thenTABLE_D isdependent onTABLE_C. This means that if you queryTABLE_D and the query is not in Looker's cache, Looker will rebuildTABLE_D. But first, it must rebuildTABLE_C.

Consider a scenario with cascading temporary derived tables, whereTABLE_D is dependent onTABLE_C, which is dependent onTABLE_B, which is dependent onTABLE_A. If Looker doesn't have valid results for a query onTABLE_C in the cache, Looker will build all the tables it needs for the query. So Looker will buildTABLE_A, and thenTABLE_B, and thenTABLE_C:

In this scenario,TABLE_A must finish generating before Looker can start generatingTABLE_B, andTABLE_B must finish generating before Looker can start generatingTABLE_C. WhenTABLE_C is finished, Looker will provide the query results. (SinceTABLE_D isn't needed to answer this query, Looker won't rebuildTABLE_D at this time.)

See thedatagroup parameter documentation page for an example scenario of cascading PDTs that use the same datagroup.

The same basic logic applies for PDTs: Looker will build any table that is required to answer a query, all the way up the chain of dependencies. But with PDTs, it is often the case that the tables already exist and don't need to be rebuilt. With standard user queries on cascading PDTs, Looker rebuilds the PDTs in the cascade only if there's no valid version of the PDTs in the database. If you want to force a rebuild for all PDTs in a cascade, you canmanually rebuild the tables for a query through an Explore.

An important logical point to understand is that in the case of a PDT cascade, a dependent PDT is essentiallyquerying the PDT it depends on. This is significant especially for PDTs that use thepersist_for strategy. Typically,persist_for PDTs are built when a user queries them, remain in the database until theirpersist_for interval is up, and then are not rebuilt until they are next queried by a user. However, if apersist_for PDT is part of a cascade with trigger-based PDTs (PDTs that use thedatagroup_trigger,interval_trigger, orsql_trigger_value persistence strategy), thepersist_for PDT is essentially being queried whenever its dependent PDTs are rebuilt. So, in this case, thepersist_for PDT will be rebuilt on the schedule of its dependent PDTs. This means thatpersist_for PDTs can be affected by the persistence strategy of their dependents.

Manually rebuilding persistent tables for a query

Users can select theRebuild Derived Tables & Run option from an Explore's menu to override the persistence settings and rebuild all the persistent derived tables (PDTs) andaggregate tables required for the current query in the Explore:

Clicking the Explore Actions button opens the Explore menu, from which you can select Rebuild Derived Tables & Run.

This option is visible only to users withdevelop permission, and only after the Explore query has loaded.

TheRebuild Derived Tables & Run option rebuilds all the persistent tables (all the PDTs andaggregate tables) that are required to answer the query, regardless of their persistence strategy. This includes any aggregate tables and PDTs in the current query, and it also includes any aggregate tables and PDTsthat are referenced by the aggregate tables and PDTs in the current query.

In the case ofincremental PDTs, theRebuild Derived Tables & Run option triggers the build of a new increment. With incremental PDTs, an increment includes the time period specified in theincrement_key parameter, and also the number of previous time periods specified in theincrement_offset parameter, if any. See theIncremental PDTs documentation page for some example scenarios that show how incremental PDTs build, depending on their configuration.

In the case ofcascading PDTs, this means rebuilding all the derived tables in the cascade, starting at the top. This is the same behavior as when you query a table in acascade of temporary derived tables:

If table_c depends on table_b, and table_b depends on table_a, then rebuilding table_c first rebuilds table_a, then table_b, and finally table_c.

Note the following about manually rebuilding derived tables:

  • For the user who initiates theRebuild Derived Tables & Run operation, the query will wait for the tables to rebuild before loading results. Other users' queries will still use the existing tables. Once the persistent tables are rebuilt, then all users will use the rebuilt tables. Although this process is designed to avoid interrupting other users' queries while the tables are rebuilding, those users could still be affected by the additional load on your database. If you are in a situation where triggering a rebuild during business hours could put an unacceptable strain on your database, you may need to communicate to your users that they should never rebuild certain PDTs or aggregate tables during those hours.
  • If a user is inDevelopment Mode and the Explore is based on adevelopment table, theRebuild Derived Tables & Run operation will rebuild the development table, not the production table, for the Explore. But if the Explore in Development Mode is using the production version of a derived table, the production table will be rebuilt. SeePersisted tables in Development Mode for information on development tables and production tables.

  • For Looker-hosted instances, if the derived table takes longer than one hour to rebuild, the table won't rebuild successfully and the browser session will time out. See theQuery timeouts and queueing section on theAdmin settings - Queries documentation page for more information about timeouts that may affect Looker processes.

Persisted tables in Development Mode

Looker has some special behaviors for managing persisted tables inDevelopment Mode.

Note: These behaviors apply to any type of persisted table, which includesaggregate tables as well as persistent derived tables (PDTs).

If you query a persisted table in Development Modewithout making any changes to its definition, Looker will query the production version of that table. If youdo make a change to the table definition that affects the data in the table or the way that the table is queried, a new development version of the table will be created the next time you query the table in Development Mode. Having such a development table lets you test changes without disturbing users.

What prompts Looker to create a development table

When possible, Looker uses the existing production table to answer queries, whether or not you are in Development Mode. But there are certain cases where Looker cannot use the production table for queries in Development Mode:

  • If your persisted table has a parameter that narrows its dataset towork faster in Development Mode
  • If you have made changes to the definition of your persisted table that affect the data in the table

Looker will build a development table if you are in Development Mode and you query aSQL-based derived table that is defined using aconditionalWHERE clause withif prod andif dev statements.

Note: This is not the case fornative derived tables with thedev_filters parameter. For native derived tables withdev_filters, Looker has the logic to use the production table to answer queries in Development Mode, unless you change the definition of the table and then query the table in Development Mode.

For persisted tables that don't have a parameter to narrow the dataset in Development Mode, Looker uses the production version of the table to answer queries in Development Mode, unless you change the definition of the table andthen query the table in Development Mode. This goes for any changes to the table that affect the data in the table or the way that the table is queried.

Here are some examples of the types of changes that will prompt Looker to create a development version of a persistent table (Looker will create the table only if you subsequently query the table after making these changes):

For changes that don't modify the table's data or affect the way that Looker queries the table, Looker won't create a development table. Thepublish_as_db_view parameter is a good example: In Development Mode, if you change only thepublish_as_db_view setting for a derived table, Looker doesn't need to rebuild the derived table so won't create a development table.

How long Looker persists development tables

Regardless of the table's actual persistence strategy, Looker treats development persisted tables as if they had apersistence strategy ofpersist_for: "24 hours". Looker does this to ensure that development tables aren't persisted for more than a day, since a Looker developer may query many iterations of a table during development, and each time a new development table is built. To prevent the development tables from cluttering the database, Looker applies thepersist_for: "24 hours" strategy to be sure that the tables are frequently cleaned from the database.

Otherwise, Looker builds persistent derived tables (PDTs) and aggregate tables in Development Mode the same way it builds persisted tables in Production Mode.

If a development table is persisted on your database when you deploy changes to a PDT or an aggregate table, Looker can often use the development table as the production table so that your users don't have to wait for the table to build when they query the table.

Note that when you deploy your changes, the table may still need to be rebuilt to be queried in production, depending on the situation:

  • If it has been over 24 hours since you queried the table in Development Mode, the development version of the table is tagged as expired and won't be used for queries. You can check for unbuilt PDTsby using the Looker IDE or by using theDevelopment tab of thePersistent Derived Tables page. If you have unbuilt PDTs, you can query them in Development Mode right before you make your changes so that the development table is available to be used in production.
  • If a persisted table has thedev_filters parameter (fornative derived tables) or theconditionalWHERE clause that is using theif prod andif dev statements (forSQL-based derived tables), the development table cannot be used as the production version, since the development version has an abbreviated dataset. If this is the case, after you've finished developing the table and before you deploy your changes, you can comment out thedev_filters parameter or the conditionalWHERE clause and then query the table in Development Mode. Looker will then build a full version of the table that can be used for production when you deploy your changes.

Otherwise, if you deploy your changes when there is no valid development table that can be used as the production table, Looker will rebuild the table the next time the table is queried in Production Mode (for persisted tables that use thepersist_for strategy), or the next time theregenerator runs (for persisted tables that usedatagroup_trigger,interval_trigger, orsql_trigger_value).

Checking for unbuilt PDTs in Development Mode

If a development table is persisted on your database when you deploy changes to a persistent derived table (PDT) or an aggregate table, Looker can often use the development table as the production table so that your users don't have to wait for the table to build when they query the table. See theHow long Looker persists development tables andWhat prompts Looker to create a development table sections on this page for more details.

Therefore, it is optimal that all your PDTs are built when you deploy to production so that the tables can be used immediately as the production versions.

You can check your project for unbuilt PDTs in theProject Health panel. Click theProject Health icon in the Looker IDE to open theProject Health panel. Then click theValidate PDT Status button.

If there are unbuilt PDTs, theProject Health panel will list them:

The Project Health panel shows both a list of unbuilt PDTs for the project and a Go to PDT Management button.

If you havesee_pdts permission, you can click theGo to PDT Management button. Looker will open theDevelopment tab of thePersistent Derived Tables page and filter the results to your specific LookML project. From there, you can see which development PDTs are built and unbuilt, as well as access other troubleshooting information. See theAdmin settings - Persistent Derived Tables documentation page for more information.

Once you identify an unbuilt PDT in your project, you can build a development version it by opening an Explore that queries the table, then using theRebuild Derived Tables & Run option from the Explore menu. See theManually rebuilding persistent tables for a query section on this page.

Note: If a persisted table is aSQL-based derived table that is defined using aconditionalWHERE clause withif prod andif dev statements, you won't be able to use the development version of the table for production, since the development version of the table has an abbreviated dataset. See theHow long Looker persists development tables section on this page for more information. Fornative derived tables with thedev_filters parameter, Looker does use the production table to answer queries in Development Mode, unless you change the definition of the table and then query the table in Development Mode (see theWhat prompts Looker to create a development table section on this documentation page).

Table sharing and cleanup

Within any given Looker instance, Looker will share persisted tables between users if the tables have the same definition and the same persistence method setting. Additionally, if a table's definition ever ceases to exist, Looker marks the table as expired.

This has several benefits:

  • If you haven't made any changes to a table in Development Mode, your queries will use the existing production tables. This is the case unless your table is aSQL-based derived table that is defined using aconditionalWHERE clause withif prod andif dev statements. If the table is defined with a conditionalWHERE clause, Looker will build a development table if you query the table in Development Mode. (Fornative derived tables with thedev_filters parameter, Looker has the logic to use the production table to answer queries in Development Mode, unless you change the definition of the table and then query the table in Development Mode.)
  • If two developers happen to make the same change to a table while in Development Mode, they will share the same development table.
  • Once you push your changes from Development Mode to Production Mode, the old production definition does not exist anymore, so the old production table is marked as expired and will be dropped.
  • If you decide to throw away your Development Mode changes, that table definition does not exist anymore, so the unneeded development tables are marked as expired and will be dropped.

Working faster in Development Mode

There are situations when the persistent derived table (PDT) that you're creating takes a long time to generate, which can be time-consuming if you are testing lots of changes in Development Mode. For these cases, you can prompt Looker to create smaller versions of a derived table when you're in Development Mode.

Fornative derived tables, you can use thedev_filters subparameter ofexplore_source to specify filters that are only applied to development versions of the derived table:

view: e_faa_pdt {  derived_table: {  ...    datagroup_trigger: e_faa_shared_datagroup    explore_source: flights {      dev_filters: [flights.event_date: "90 days"]      filters: [flights.event_date: "2 years", flights.airport_name: "Yucca Valley Airport"]      column: id {}      column: airport_name {}      column: event_date {}    }  }...}

This example includes adev_filters parameter that filters the data to the last 90 days and afilters parameter that filters the data to the last 2 years and to the Yucca Valley Airport.

Thedev_filters parameter acts in conjunction with thefilters parameter so that all filters are applied to the development version of the table. If bothdev_filters andfilters specify filters for the same column,dev_filters takes precedence for the development version of the table. In this example, the development version of the table will filter the data to the last 90 days for the Yucca Valley Airport.

ForSQL-based derived tables, Looker supports a conditionalWHERE clause with different options for production (if prod) and development (if dev) versions of the table:

view: my_view {  derived_table: {    sql:      SELECT        columns      FROM        my_table      WHERE        -- if prod -- date > '2000-01-01'        -- if dev -- date > '2020-01-01'      ;;  }}

In this example, the query will include all data from 2000 onward when in Production Mode but only the data from 2020 onward when in Development Mode. Using this feature strategically to limit your result set, and increase query speed, can make Development Mode changes much easier to validate.

Note: If a persisted table has thedev_filters parameter or the conditionalWHERE clause, the development table cannot be used as the production version, since the development version has an abbreviated dataset. If this is the case, after you've finished developing the table and before you deploy your changes, you can comment out thedev_filters parameter or the conditionalWHERE clause and then query the table in Development Mode. Looker will then build a full version of the table that can be used for production when you deploy your changes. See theHow long Looker persists development tables section on this page for more details about using development tables in production.

How Looker builds PDTs

After a persistent derived table (PDT) has been defined and is either run for the first time or triggered by theregenerator for rebuilding according to itspersistence strategy, Looker will go through the following steps:

  1. Use the derived table SQL to fashion a CREATE TABLE AS SELECT (or CTAS) statement and execute it. For example, to rebuild a PDT calledcustomer_orders_facts:CREATE TABLE tmp.customer_orders_facts AS SELECT ... FROM ... WHERE ...
  2. Issue the statements to create the indexes when the table is built
  3. Rename the table from LC$.. ("Looker Create") to LR$.. ("Looker Read"), to indicate the table is ready to use
  4. Drop any older version of the table that should no longer be in use

There are a few important implications:

  • The SQL that forms the derived table must be valid inside a CTAS statement.
  • The column aliases on the result set of the SELECT statement must be valid column names.
  • The names used when specifying distribution, sortkeys, and indexes must be the column names that are listed in the SQL definition of the derived table, not the field names that are defined in the LookML.

The Looker regenerator

The Looker regenerator checks the status and initiates rebuilds for trigger-persisted tables. A trigger-persisted table is a persistent derived table (PDT) or anaggregate table that uses a trigger as a persistence strategy:

  • For tables that usesql_trigger_value, the trigger is a query that is specified in the table'ssql_trigger_value parameter. The Looker regenerator triggers a rebuild of the table when the result of the latest trigger query check is different from the result of the previous trigger query check. For example, if your derived table is persisted with the SQL querySELECT CURDATE(), the Looker regenerator will rebuild the table the next time the regenerator checks the trigger after the date changes.
  • For tables that useinterval_trigger, the trigger is a time duration that is specified in the table'sinterval_trigger parameter. The Looker regenerator triggers a rebuild of the table when the specified time has passed.
  • For tables that usedatagroup_trigger, the trigger can be a query specified in the associated datagroup'ssql_trigger parameter, or the trigger can be a time duration that is specified in the datagroup'sinterval_trigger parameter.

The Looker regenerator also initiates rebuilds for persisted tables that use thepersist_for parameter, but only when thepersist_for table is a dependencycascade of a trigger-persisted table. In this case, the Looker regenerator will initiate rebuilds for apersist_for table, since the table is needed to rebuild the other tables in the cascade. Otherwise, the regenerator doesn't monitor persisted tables that use thepersist_for strategy.

The Looker regenerator cycle begins at a regular interval that is configured by your Looker admin in theMaintenance Schedule setting on your database connection (the default is a five-minute interval). However, the Looker regenerator does not start a new cycle until it has completed all the checks and PDT rebuilds from the last cycle. This means if you have long-running PDT builds, the Looker regenerator cycle may not run as often as defined in theMaintenance Schedule setting. Other factors can affect the time that is required to rebuild your tables, as described in theImportant considerations for implementing persisted tables section on this page.

In cases where a PDT fails to build, the regenerator may attempt to rebuild the table in the next regenerator cycle:

  • If theRetry Failed PDT Builds setting is enabled on your database connection, the Looker regenerator will attempt to rebuild the table during the next regenerator cycle, even if the table's trigger condition is not met.
  • If theRetry Failed PDT Builds setting is disabled, the Looker regenerator won't attempt to rebuild the table until the PDT's trigger condition is met.

If a user requests data from the persisted table while it is being built and the query results aren't in the cache, Looker checks to see if the existing table is still valid. (The previous table may not be valid if it is not compatible with the new version of the table, which can happen if the new table has a different definition, the new table uses a different database connection, or the new table was created with a different version of Looker.) If the existing table is still valid, Looker will return data from the existing table until the new table is built. Otherwise, if the existing table is not valid, Looker will provide query results once the new table is rebuilt.

Important considerations for implementing persisted tables

Considering the usefulness of persisted tables (PDTs andaggregate tables), it is possible to accumulate many of them on your Looker instance. It is possible to create a scenario in which theLooker regenerator needs to build many tables at the same time. Especially withcascading tables, or long-running tables, you can create a scenario where tables have a long delay before rebuilding, or where users experience a delay in getting query results from a table while the database is working hard to generate the table.

The Lookerregenerator checks PDT triggers to see if it should rebuild trigger-persisted tables. The regenerator cycle is set at a regular interval that is configured by your Looker admin in theMaintenance Schedule setting on your database connection (the default is a five-minute interval).

Several factors can affect the time that is required to rebuild your tables:

  • Your Looker admin may have changed the interval of the regenerator trigger checks by using theMaintenance Schedule setting on your database connection.
  • The Looker regenerator does not start a new cycle until it has completed all of the checks and PDT rebuilds from the last cycle. So if you have long-running PDT builds, the Looker regenerator cycle may not be as frequent as theMaintenance Schedule setting.
  • By default, the regenerator can initiate the rebuilding of one PDT or aggregate table at a time over a connection. A Looker admin can adjust the regenerator's allowed number of concurrent rebuilds by using theMax number of PDT builder connections field in a connection's settings.
  • All PDTs and aggregate tables triggered by the samedatagroup will rebuild during the same regeneration process. This can be a heavy load if you have many tables using the datagroup, either directly or as a result ofcascading dependencies.

In addition to the previous considerations, there are also some situations in which you should avoid adding persistence to a derived table:

  • When derived tables will beextended — Each extension of a PDT will create a new copy of the table in your database.
  • When derived tables usetemplated filters or Liquid parameters — Persistence is not supported for derived tables that use templated filters or Liquid parameters.
  • Whennative derived tables are built from Explores that useuser attributes withaccess_filters, or withsql_always_where — Copies of the table will be built in your database for each possible user attribute value specified.
  • When the underlying data changes frequently and your database dialect does not supportincremental PDTs.
  • When the cost and time involved in creating PDTs is too high.

Depending on the number and complexity of persisted tables on your Looker connection, the queue might contain many persisted tables that need to be checked and rebuilt at each cycle, so it is important to keep these factors in mind when implementing derived tables on your Looker instance.

Managing PDTs at scale using API

Monitoring and managing persistent derived tables (PDTs) that refresh on varying schedules becomes increasingly complex as you create more PDTs on your instance. Consider using the LookerApache Airflow integration to manage your PDT schedules alongside your other ETL and ELT processes.

Monitoring and troubleshooting PDTs

If you use persistent derived tables (PDTs), and especiallycascading PDTs, it is helpful to see the status of your PDTs. You can use the LookerPersistent Derived Tables admin page to see the status of your PDTs. You can also check thePDT troubleshooting tree for step-by-step debugging.

When attempting to troubleshoot PDTs:

  • Pay special attention to the distinction betweendevelopment tables and production tables when investigating thePDT Event Log.
  • Verify that theTemp Database setting on your Looker connection matches your actual scratch schema or database. If theTemp Database setting on the connection doesn't match the scratch schema on your database, update theTemp Database setting so that Looker can store persistent derived tables on your database.
  • Determine if there are problems with all PDTs, or just one. If there is a problem with one, then the issue is likely caused by a LookML or SQL error.
  • Determine if problems with the PDT correspond with the times when it is scheduled to rebuild.
  • Make sure that allsql_trigger_value queries evaluate successfully and that they return only one row and column. For SQL-based PDTs, you can do this by running them inSQL Runner. (Applying aLIMIT protects from runaway queries.) For more information on using SQL Runner to debug derived tables, see theUsing sql runner to test derived tables Community post.
  • For SQL-based PDTs, use SQL Runner to verify that the SQL of the PDT executes without error. (Be sure to apply aLIMIT in SQL Runner to keep query times reasonable.)
  • For SQL-based derived tables, avoid usingcommon table expressions (CTEs). Using CTEs with DTs creates nestedWITH statements that can cause PDTs to fail without warning. Instead, use the SQL for your CTE to create a secondary DT and reference that DT from your first DT using the${derived_table_or_view_name.SQL_TABLE_NAME} syntax.
  • Check that any tables on which the problem PDT depends — whether normal tables or PDTs themselves — exist and can be queried.
  • Ensure that any tables on which the problem PDT depends don't have any shared or exclusive locks. For Looker to successfully build a PDT, it needs to acquire an exclusive lock on the table that needs to be updated. This will conflict with other shared or exclusive locks that are on the table. Looker will be unable to update the PDT until all other locks have cleared. The same is true for any exclusive locks on the table that Looker is building a PDT from; if there is an exclusive lock on a table, Looker won't be able to acquire a shared lock to run queries until the exclusive lock clears.
  • Use theShow Processes button in SQL Runner. If there are a large number of processes active, this could slow down query times.
  • Monitor comments in the query. See theQuery comments for PDTs section on this page.

Query comments for PDTs

Database administrators can differentiate normal queries from those that generate persistent derived tables (PDTs). Looker adds comments to theCREATE TABLE ... AS SELECT ... statement that includes the PDT's LookML model and view, plus a unique identifier (slug) for the Looker instance. If the PDT is being generated on behalf of a user in Development Mode, the comments will indicate the user's ID. The PDT generation comments follow this pattern:

-- Building `<view_name>` in dev mode for user `<user_id>` on instance `<instance_slug>`CREATE TABLE `<table_name>` SELECT ...-- finished `<view_name>` => `<table_name>`

The PDT generation comment will appear in an Explore's SQL tab if Looker has had to generate a PDT for the Explore's query. The comment will appear at the top of the SQL statement.

Finally, the PDT generation comment appears in theMessage field on theInfo tab of theQuery Details pop-up for each query on theQueries admin page.

Rebuilding PDTs after a failure

When a persistent derived table (PDT) has a failure, here is what happens when that PDT is queried:

  • Looker will use the results in the cache if the same query was previously run. (See theCaching queries documentation page for an explanation of how this works.)
  • If the results aren't in the cache, Looker will pull results from the PDT in the database, if a valid version of the PDT exists.
  • If there is no valid PDT in the database, Looker will attempt to rebuild the PDT.
  • If the PDT can't be rebuilt, Looker will return an error for a query. TheLooker regenerator will attempt to rebuild the PDT the next time the PDT is queried or the next time the PDT's persistence strategy triggers a rebuild.

Withcascading PDTs, the same logic applies, except that with cascading PDTs:

  • A failure to build for one table prevents the building of the PDTs down the dependency chain.
  • A dependent PDT is essentially querying the PDT it relies on, so the persistence strategy of one table can trigger rebuilds of the PDTs goingup the chain.

Revisiting the previous example ofcascading tables, whereTABLE_D is dependent onTABLE_C, which is dependent onTABLE_B, which is dependent onTABLE_A:

IfTABLE_B has a failure, all the standard (non-cascade) behavior applies forTABLE_B:

  1. IfTABLE_B is queried, Looker first tries to use the cache to return results.
  2. If this attempt fails, Looker next tries to use a previous version of the table, if possible.
  3. If this attempt also fails, Looker then tries to rebuild the table.
  4. Finally, ifTABLE_B can't be rebuilt, Looker will return an error.

Looker will try again to rebuildTABLE_B when the table is next queried or when the table's persistence strategy next triggers a rebuild.

The same also applies for the dependents ofTABLE_B. So ifTABLE_B can't be built, and there is a query onTABLE_C, the following sequence occurs:

  1. Looker will try to use the cache for the query onTABLE_C.
  2. If the results aren't in the cache, Looker will try to pull results fromTABLE_C in the database.
  3. If there is no valid version ofTABLE_C, Looker will try to rebuildTABLE_C, which creates a query onTABLE_B.
  4. Looker will then try to rebuildTABLE_B (which will fail ifTABLE_B hasn't been fixed).
  5. IfTABLE_B can't be rebuilt, thenTABLE_C can't rebuild, so Looker will return an error for the query onTABLE_C.
  6. Looker will then attempt to rebuildTABLE_C according to its usual persistence strategy, or the next time the PDT is queried (which includes the next timeTABLE_D tries to build, sinceTABLE_D depends onTABLE_C).

Once you resolve the problem withTABLE_B, thenTABLE_B and each of the dependent tables will attempt to rebuild according to their persistence strategies, or the next time they are queried (which includes the next time a dependent PDT attempts to rebuild). Or, if a development version of the PDTs in the cascade was built in Development Mode, the development versions may be used as the new production PDTs. (See thePersisted tables in Development Mode section on this page for how this works.) Or you can use an Explore to run a query onTABLE_D and thenmanually rebuild the PDTs for the query, which will force a rebuild of all the PDTs going up the dependency cascade.

Improving PDT performance

When youcreate persistent derived tables (PDTs), performance can be a concern. Especially when the table is very large, querying the table may be slow, just as it can be for any large table in your database.

You can improve performance byfiltering the data or bycontrolling how the data in the PDT is sorted and indexed.

Adding filters to limit the dataset

With particularly large datasets, having many rows will slow down queries against a persistent derived table (PDT). If you usually query only recent data, consider adding a filter to theWHERE clause of your PDT that limits the table to 90 days or fewer of data. This way, only relevant data will be added to the table each time it rebuilds so that running queries will be much faster. Then, you can create a separate, larger PDT for historical analysis to allow for both fast queries for recent data and the ability to query old data.

Usingindexes orsortkeys anddistribution

When you create a large persistent derived table (PDT), indexing the table (for dialects such as MySQL or Postgres) or adding sortkeys and distribution (for Redshift) can help with performance.

It is usually best to add theindexes parameter on ID or date fields.

For Redshift, it is usually best to add thesortkeys parameter on ID or date fields and thedistribution parameter on the field that is used for joining.

Recommended settings to improve performance

The following settings control how the data in the persistent derived table (PDT) is sorted and indexed. These settings are optional, but highly recommended:

  • For Redshift and Aster, use thedistribution parameter to specify the column name whose value is used to spread the data around a cluster. When two tables are joined by the column specified in thedistribution parameter, the database can find the join data on the same node, so inter-node I/O is minimized.
  • For Redshift, set thedistribution_style parameter toall to instruct the database to keep a complete copy of the data on each node. This is often used to minimize inter-node I/O when reltively small tables are joined. Set this value toeven to instruct the database to spread the data evenly through the cluster without using a distribution column. This value can only be specified whendistribution is not specified.
  • For Redshift, use thesortkeys parameter. The values specify which columns of the PDT are used to sort the data on disk to make searching easier. On Redshift, you may use eithersortkeys orindexes, but not both.
  • On most databases, use theindexes parameter. The values specify which columns of the PDT are indexed. (On Redshift, indexes are used to generate interleaved sort keys.)

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.