Best practices for functions

This document describes how to optimize queries that use SQL functions.

Optimize string comparison

Best practice: When possible, useLIKE instead ofREGEXP_CONTAINS.

In BigQuery, you can use theREGEXP_CONTAINSfunction or theLIKEoperator to compare strings.REGEXP_CONTAINS provides more functionality,but also has a slower execution time. UsingLIKE instead ofREGEXP_CONTAINSis faster, particularly if you don't need the full power of regular expressionsthatREGEXP_CONTAINS provides, for example wildcard matching.

Consider the following use of theREGEXP_CONTAINS function:

SELECTdim1FROM`dataset.table1`WHEREREGEXP_CONTAINS(dim1,'.*test.*');

You can optimize this query as follows:

SELECTdim1FROM`dataset.table`WHEREdim1LIKE'%test%';

Optimize aggregation functions

Best practice: If your use case supports it, use an approximate aggregationfunction.

If the SQL aggregation function you're using has an equivalent approximationfunction, the approximation function yields faster query performance. Forexample, instead of usingCOUNT(DISTINCT),useAPPROX_COUNT_DISTINCT.For more information, seeapproximate aggregation functions.

You can also useHyperLogLog++ functions to do approximations (including customapproximate aggregations). For more information, seeHyperLogLog++ functionsin the GoogleSQL reference.

Consider the following use of theCOUNT function:

SELECTdim1,COUNT(DISTINCTdim2)FROM`dataset.table`GROUPBY1;

You can optimize this query as follows:

SELECTdim1,APPROX_COUNT_DISTINCT(dim2)FROM`dataset.table`GROUPBY1;

Optimize quantile functions

Best practice: When possible, useAPPROX_QUANTILE instead ofNTILE.

Running a query that contains theNTILEfunction can fail with aResources exceedederror if there are too manyelements toORDER BY in a single partition, which causes data volume to grow.The analytic window isn't partitioned, so theNTILE computation requiresa globalORDER BY for all rows in the table to be processedby a single worker/slot.

Try usingAPPROX_QUANTILESinstead. This function allows the query to run more efficiently because itdoesn't require a globalORDER BY for all rows in the table.

Consider the following use of theNTILE function:

SELECTindividual_id,NTILE(nbuckets)OVER(ORDERBYsalesdesc)ASsales_thirdFROM`dataset.table`;

You can optimize this query as follows:

WITHQuantInfoAS(SELECTo,qvalFROMUNNEST((SELECTAPPROX_QUANTILES(sales,nbuckets)FROM`dataset.table`))ASqvalWITHoffsetoWHEREo>0)SELECTindividual_id,(SELECT(nbuckets+1)-MIN(o)FROMQuantInfoWHEREsales<=QuantInfo.qval)ASsales_thirdFROM`dataset.table`;

The optimized version gives similar but not identical results to the originalquery, becauseAPPROX_QUANTILES:

  1. Provides an approximate aggregation.
  2. Places the remainder values (the remainder of the number of rowsdivided by buckets) in a different way.

Optimize UDFs

Best practice: Use SQL UDFs for simple calculations because the query optimizer can apply optimizations to SQL UDF definitions. Use Javascript UDFs for complex calculations that are not supported by SQL UDF.

Calling a JavaScript UDF requires the instantiation of a subprocess.Spinning up this process and running the UDF directly impacts query performance.If possible, use anative (SQL) UDFinstead.

Persistent UDFs

It is better to create persistent user-defined SQL and JavaScript functionsin a centralized BigQuery dataset that can be invoked acrossqueries and in logical views, as opposed to creating and calling a UDF in codeeach time. Creating org-wide libraries of business logic within shared datasetshelps optimize performance and use fewer resources.

The following example shows how a temporary UDF is invoked in a query:

CREATETEMPFUNCTIONaddFourAndDivide(xINT64,yINT64)AS((x+4)/y);WITHnumbersAS(SELECT1asvalUNIONALLSELECT3asvalUNIONALLSELECT4asvalUNIONALLSELECT5asval)SELECTval,addFourAndDivide(val,2)ASresultFROMnumbers;

You can optimize this query by replacing the temporary UDF with apersistent one:

WITHnumbersAS(SELECT1asvalUNIONALLSELECT3asvalUNIONALLSELECT4asvalUNIONALLSELECT5asval)SELECTval,`your_project.your_dataset.addFourAndDivide`(val,2)ASresultFROMnumbers;

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.