Migrating to GoogleSQL

BigQuery supports two SQL dialects:GoogleSQL andlegacy SQL. This document explains thedifferences between the two dialects, includingsyntax,functions, andsemantics,and gives examples of some of thehighlights of GoogleSQL.

Comparison of legacy and GoogleSQL

When initially released, BigQuery ran queries using anon-GoogleSQL dialect known asBigQuerySQL. With the launch ofBigQuery 2.0, BigQuery released support forGoogleSQL, andrenamed BigQuery SQL to legacy SQL. GoogleSQL is thepreferred SQL dialect for querying data stored in BigQuery.

Do I have to migrate to GoogleSQL?

We recommend migrating from legacy SQL to GoogleSQL, but it's not required. Forexample, suppose that you execute many queries that use legacy SQL, but you wantto take advantage of a GoogleSQL feature for a new query. You can create newqueries using GoogleSQL that run alongside queries using legacy SQL.

Enabling GoogleSQL

You have a choice of whether to use legacy or GoogleSQL when you run aquery. For information about switching between SQL dialects, seeBigQuery SQL dialects.

Advantages of GoogleSQL

GoogleSQL complies with the SQL 2011 standard, and has extensions thatsupport querying nested and repeated data. It has several advantages over legacySQL, including:

For examples that demonstrate some of these features, seeGoogleSQL highlights.

Type differences

Legacy SQL types have an equivalent in GoogleSQL and vice versa. In somecases, the type has a different name. The following table lists each legacy SQLdata type and its GoogleSQL equivalent.

Legacy SQLGoogleSQLNotes
BOOLBOOL
INTEGERINT64
FLOATFLOAT64
STRINGSTRING
BYTESBYTES
RECORDSTRUCT
REPEATEDARRAY
TIMESTAMPTIMESTAMPSeeTIMESTAMP differences
DATEDATELegacy SQL has limited support forDATE
TIMETIMELegacy SQL has limited support forTIME
DATETIMEDATETIMELegacy SQL has limited support forDATETIME

For more information about the GoogleSQL type system, see theGoogleSQL data types reference.For more information about data types in BigQuery, see theBigQuery data types reference.

TIMESTAMP differences

GoogleSQL has astricter range of validTIMESTAMP valuesthan legacy SQL does. In GoogleSQL, validTIMESTAMP values are in therange of0001-01-01 00:00:00.000000 to9999-12-31 23:59:59.999999. Forexample, you can select the minimum and maximumTIMESTAMP values usingGoogleSQL:

#standardSQLSELECTmin_timestamp,max_timestamp,UNIX_MICROS(min_timestamp)ASmin_unix_micros,UNIX_MICROS(max_timestamp)ASmax_unix_microsFROM(SELECTTIMESTAMP'0001-01-01 00:00:00.000000'ASmin_timestamp,TIMESTAMP'9999-12-31 23:59:59.999999'ASmax_timestamp);

This query returns-62135596800000000 asmin_unix_micros and253402300799999999 asmax_unix_micros.

If you select a column that contains timestamp values outside of thisrange, you receive an error:

#standardSQLSELECTtimestamp_column_with_invalid_valuesFROMMyTableWithInvalidTimestamps;

This query returns the following error:

Cannotreturnaninvalidtimestampvalueof-8446744073709551617microsecondsrelativetotheUnixepoch.Therangeofvalidtimestampvaluesis[0001-01-100:00:00,9999-12-3123:59:59.999999]

To correct the error, one option is to define and use auser-defined functionto filter the invalid timestamps:

#standardSQLCREATETEMPFUNCTIONTimestampIsValid(tTIMESTAMP)AS(t>=TIMESTAMP('0001-01-01 00:00:00')ANDt<=TIMESTAMP('9999-12-31 23:59:59.999999'));SELECTtimestamp_column_with_invalid_valuesFROMMyTableWithInvalidTimestampsWHERETimestampIsValid(timestamp_column_with_invalid_values);

Another option to correct the error is to use theSAFE_CASTfunction with the timestamp column. For example:

#standardSQLSELECTSAFE_CAST(timestamp_column_with_invalid_valuesASSTRING)AStimestamp_stringFROMMyTableWithInvalidTimestamps;

This query returnsNULL rather than a timestamp string for invalidtimestamp values.

GoogleSQL highlights

This section discusses some of the highlights of GoogleSQL compared tolegacy SQL.

Composability usingWITH clauses

Some of the GoogleSQL examples on this page make use of aWITH clause,which enables extraction or reuse of named subqueries. For example:

#standardSQLWITHTAS(SELECTxFROMUNNEST([1,2,3,4])ASx)SELECTx/(SELECTSUM(x)FROMT)ASweighted_xFROMT;

This query defines a named subqueryT that containsx values of 1, 2, 3,and 4. It selectsx values fromT and divides them by the sum of allxvalues inT. This query is equivalent to a query where the contents ofTare inline:

#standardSQLSELECTx/(SELECTSUM(x)FROM(SELECTxFROMUNNEST([1,2,3,4])ASx))ASweighted_xFROM(SELECTxFROMUNNEST([1,2,3,4])ASx);

As another example, consider this query, which uses multiple named subqueries:

#standardSQLWITHTAS(SELECTxFROMUNNEST([1,2,3,4])ASx),TPlusOneAS(SELECTx+1ASyFROMT),TPlusOneTimesTwoAS(SELECTy*2ASzFROMTPlusOne)SELECTzFROMTPlusOneTimesTwo;

This query defines a sequence of transformations of the original data, followedby aSELECT statement overTPlusOneTimesTwo. This query is equivalent to thefollowing query, which inlines the computations:

#standardSQLSELECT(x+1)*2ASzFROM(SELECTxFROMUNNEST([1,2,3,4])ASx);

For more information, seeWITH clause.

Composability using SQL functions

GoogleSQL supportsuser-defined SQL functions.You can use user-defined SQL functions to define common expressions and thenreference them from the query. For example:

#standardSQL-- Computes the harmonic mean of the elements in 'arr'.-- The harmonic mean of x_1, x_2, ..., x_n can be expressed as:--   n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n))CREATETEMPORARYFUNCTIONHarmonicMean(arrARRAY<FLOAT64>)AS(ARRAY_LENGTH(arr)/(SELECTSUM(1/x)FROMUNNEST(arr)ASx));WITHTAS(SELECTGENERATE_ARRAY(1.0,x*4,x)ASarrFROMUNNEST([1,2,3,4,5])ASx)SELECTarr,HarmonicMean(arr)ASh_meanFROMT;

This query defines a SQL function namedHarmonicMean and then applies it tothe array columnarr fromT.

Subqueries in more places

GoogleSQL supports subqueries in theSELECT list,WHERE clause, andanywhere else in the query that expects an expression. For example, consider thefollowing GoogleSQL query that computes the fraction of warm days in Seattlein 2015:

#standardSQLWITHSeattleWeatherAS(SELECT*FROM`bigquery-public-data.noaa_gsod.gsod2015`WHEREstn='994014')SELECTCOUNTIF(max>=70)/(SELECTCOUNT(*)FROMSeattleWeather)ASwarm_days_fractionFROMSeattleWeather;

The Seattle weather station has an ID of'994014'. The query computes thenumber of warm days based on those where the temperature reached 70 degreesFahrenheit, or approximately 21 degrees Celsius, divided by the total number ofrecorded days for that station in 2015.

Correlated subqueries

In GoogleSQL, subqueries can reference correlated columns; that is, columnsthat originate from the outer query. For example, consider the followingGoogleSQL query:

#standardSQLWITHWashingtonStationsAS(SELECTweather.stnASstation_id,ANY_VALUE(station.name)ASnameFROM`bigquery-public-data.noaa_gsod.stations`ASstationINNERJOIN`bigquery-public-data.noaa_gsod.gsod2015`ASweatherONstation.usaf=weather.stnWHEREstation.state='WA'ANDstation.usaf!='999999'GROUPBYstation_id)SELECTwashington_stations.name,(SELECTCOUNT(*)FROM`bigquery-public-data.noaa_gsod.gsod2015`ASweatherWHEREwashington_stations.station_id=weather.stnANDmax>=70)ASwarm_daysFROMWashingtonStationsASwashington_stationsORDERBYwarm_daysDESC;

This query computes the names of weather stations in Washington state and thenumber of days in 2015 that the temperature reached 70 degrees Fahrenheit, orapproximately 21 degrees Celsius. Notice that there is a subquery in theSELECT list, and that the subquery referenceswashington_stations.station_idfrom the outer scope, namelyFROM WashingtonStations AS washington_stations.

Arrays and structs

ARRAY andSTRUCT are powerful concepts in GoogleSQL. As an example thatuses both, consider the following query, which computes the top two articlesfor each day in the HackerNews dataset:

#standardSQLWITHTitlesAndScoresAS(SELECTARRAY_AGG(STRUCT(title,score))AStitles,EXTRACT(DATEFROMtime_ts)ASdateFROM`bigquery-public-data.hacker_news.stories`WHEREscoreISNOTNULLANDtitleISNOTNULLGROUPBYdate)SELECTdate,ARRAY(SELECTASSTRUCTtitle,scoreFROMUNNEST(titles)ORDERBYscoreDESCLIMIT2)AStop_articlesFROMTitlesAndScoresORDERBYdateDESC;

TheWITH clause definesTitlesAndScores, which contains two columns. Thefirst is an array of structs, where one field is an article title and the secondis a score. TheARRAY_AGG expression returns an array of these structs foreach day.

TheSELECT statement following theWITH clause uses anARRAY subquery tosort and return the top two articles within each array in accordance with thescore, then returns the results in descending order by date.

For more information about arrays andARRAY subqueries, seeWorking with arrays. See also thereferences forarraysandstructs.## GoogleSQL highlights

This section discusses some of the highlights of GoogleSQL compared tolegacy SQL.

Syntax differences

Escaping reserved keywords and invalid identifiers

In legacy SQL, you escape reserved keywords and identifiers that containinvalid characters such as a space or hyphen- using square brackets[].In GoogleSQL, you escape such keywords and identifiers using backticks`. For example:

#standardSQLSELECTword,SUM(word_count)ASword_countFROM`bigquery-public-data.samples.shakespeare`WHEREwordIN('me','I','you')GROUPBYword;

Legacy SQL allows reserved keywords in some places that GoogleSQL does not.For example, the following query fails due to aSyntax error using standardSQL:

#standardSQLSELECTCOUNT(*)ASrowsFROM`bigquery-public-data.samples.shakespeare`;

To fix the error, escape the aliasrows using backticks:

#standardSQLSELECTCOUNT(*)AS`rows`FROM`bigquery-public-data.samples.shakespeare`;

For a list of reserved keywords and what constitutes valid identifiers, seeLexical structure.

Project-qualified table names

In legacy SQL, to query a table with a project-qualified name, you use acolon,:, as a separator. For example:

#legacySQLSELECTwordFROM[bigquery-public-data:samples.shakespeare]LIMIT1;

In GoogleSQL, you use a period,., instead. For example:

#standardSQLSELECTwordFROM`bigquery-public-data.samples.shakespeare`LIMIT1;

If your project name includes a domain, such asexample.com:myproject, you useexample.com:myproject as the project name, including the:.

Table decorators

GoogleSQL does not supporttable decorators.You can achieve the semantics of time decorators (formerly known assnapshotdecorators) by using theFOR SYSTEM_TIME AS OFclause, which references the historical version of a table at a specifiedtimestamp. For more information, seeAccessing historical data using time travel.

There is no exact equivalent to range decorators in GoogleSQL. You canachieve similar semantics by creating a time-partitioned table and using apartition filter when querying data. For more information, seeQuerying partitioned tables.Another option is to create date-sharded tables and filter on the_TABLE_SUFFIX pseudocolumn. For more information, seeWildcard tables.

Wildcard functions

GoogleSQL does not support theTABLE_DATE_RANGE,TABLE_DATE_RANGE_STRICT,orTABLE_QUERY functions.

You can achieve the same semantics ofTABLE_DATE_RANGE andTABLE_QUERY usinga filter on the_TABLE_SUFFIX pseudocolumn. For example, consider thefollowing legacy SQL query, which counts the number of rows across 2010 and 2011in the National Oceanic and Atmospheric Administration GSOD (global summary ofthe day) tables:

#legacySQLSELECTCOUNT(*)FROMTABLE_QUERY([bigquery-public-data:noaa_gsod],'table_id IN ("gsod2010", "gsod2011")');

An equivalent query using GoogleSQL is:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.noaa_gsod.*`WHERE_TABLE_SUFFIXIN("gsod2010","gsod2011");

For more information, including examples that useTABLE_DATE_RANGE, seeMigrating legacy SQL table wildcard functions.

Comma operator with tables

In legacy SQL, the comma operator, has the non-standard meaning ofUNION ALL when applied to tables. In GoogleSQL, the comma operator has thestandard meaning ofJOIN. For example, consider the following legacy SQLquery:

#legacySQLSELECTx,yFROM(SELECT1ASx,"foo"ASy),(SELECT2ASx,"bar"ASy);

This is equivalent to the GoogleSQL query:

#standardSQLSELECTx,yFROM(SELECT1ASx,"foo"ASyUNIONALLSELECT2ASx,"bar"ASy);

Note also that in GoogleSQL,UNION ALL associates columns by positionrather than by name. This query is equivalent to:

#standardSQLSELECTx,yFROM(SELECT1ASx,"foo"ASyUNIONALLSELECT2,"bar");

A common usage of the comma operator in GoogleSQL is toJOIN with an array.For example:

#standardSQLWITHTAS(SELECT0ASx,[1,2,3]ASarrUNIONALLSELECT1,[4,5])SELECTx,yFROMT,UNNEST(arr)ASy;

This returns the cross product of the tableT with the elements ofarr. Youcan also express the query in GoogleSQL as:

#standardSQLWITHTAS(SELECT0ASx,[1,2,3]ASarrUNIONALLSELECT1,[4,5])SELECTx,yFROMTJOINUNNEST(arr)ASy;

In this query,JOIN has the same meaning as the, comma operator separatingT andUNNEST(arr) AS y in the example above it.

Logical views

You cannot query a logical view defined with legacy SQL using GoogleSQL andvice versa due to differences in syntax and semantics between the dialects.Instead, you would need to create a new view that uses GoogleSQL--possiblyunder a different name--to replace a view that uses legacy SQL.

As an example, suppose that viewV is defined using legacy SQL as:

#legacySQLSELECT*,UTC_USEC_TO_DAY(timestamp_col)ASdayFROMMyTable;

Suppose that viewW is defined using legacy SQL as:

#legacySQLSELECTuser,action,dayFROMV;

Suppose that you execute the following legacy SQL query daily, but you want tomigrate it to use GoogleSQL instead:

#legacySQLSELECTEXACT_COUNT_DISTINCT(user),action,dayFROMWGROUPBYaction,day;

One possible migration path is to create new views using different names. Thesteps involved are:

Create a view namedV2 using GoogleSQL with the following contents:

#standardSQLSELECT*,EXTRACT(DAYFROMtimestamp_col)ASdayFROMMyTable;

Create a view namedW2 using GoogleSQL with the following contents:

#standardSQLSELECTuser,action,dayFROMV2;

Change your query that executes daily to use GoogleSQL and refer toW2instead:

#standardSQLSELECTCOUNT(DISTINCTuser),action,dayFROMW2GROUPBYaction,day;

Another option is to delete viewsV andW, then recreate them using standardSQL under the same names. With this option, you would need to migrate all ofyour queries that referenceV orW to use GoogleSQL at the same time,however.

Function comparison

The following is a partial list of legacy SQL functions and their GoogleSQLequivalents.

Legacy SQLGoogleSQLNotes
INTEGER(x)SAFE_CAST(x AS INT64)
CAST(x AS INTEGER)SAFE_CAST(x AS INT64)
DATEDIFF(t1, t2)DATE_DIFF(DATE(t1), DATE(t2), DAY)
NOW()UNIX_MICROS(CURRENT_TIMESTAMP())
STRFTIME_UTC_USEC(ts_usec, fmt)FORMAT_TIMESTAMP(fmt, TIMESTAMP_MICROS(ts_usec))
UTC_USEC_TO_DAY(ts_usec)UNIX_MICROS(TIMESTAMP_TRUNC(TIMESTAMP_MICROS(ts_usec), DAY))
UTC_USEC_TO_WEEK(ts_usec, day_of_week)UNIX_MICROS(TIMESTAMP_ADD(TIMESTAMP_TRUNC(TIMESTAMP_MICROS(ts_usec), WEEK),INTERVAL day_of_week DAY))
REGEXP_MATCH(s, pattern)REGEXP_CONTAINS(s, pattern)
IS_NULL(x)x IS NULL
LEFT(s, len)SUBSTR(s, 0, len)
RIGHT(s, len)SUBSTR(s, -len)
s CONTAINS "foo"STRPOS(s, "foo") > 0 ors LIKE '%foo%'
INSTR(s, "foo")STRPOS(s, "foo")
x % yMOD(x, y)
NEST(x)ARRAY_AGG(x)
GROUP_CONCAT_UNQUOTED(s, sep)STRING_AGG(s, sep)
SOME(x)IFNULL(LOGICAL_OR(x), false)
EVERY(x)IFNULL(LOGICAL_AND(x), true)
COUNT(DISTINCT x)APPROX_COUNT_DISTINCT(x)seenotes below
EXACT_COUNT_DISTINCT(x)COUNT(DISTINCT x)seenotes below
QUANTILES(x, boundaries)APPROX_QUANTILES(x, boundaries-1)output needs to beunnested
TOP(x, num), COUNT(*)APPROX_TOP_COUNT(x, num)
NTH(index, arr) WITHIN RECORDarr[SAFE_ORDINAL(index)]
COUNT(arr) WITHIN RECORDARRAY_LENGTH(arr)
HOST(url)NET.HOST(url)seedifferences below
TLD(url)NET.PUBLIC_SUFFIX(url)seedifferences below
DOMAIN(url)NET.REG_DOMAIN(url)seedifferences below
PARSE_IP(addr_string)NET.IPV4_TO_INT64(NET.IP_FROM_STRING(addr_string))
FORMAT_IP(addr_int64)NET.IP_TO_STRING(NET.IPV4_FROM_INT64(addr_int64 & 0xFFFFFFFF))
PARSE_PACKED_IP(addr_string)NET.IP_FROM_STRING(addr_string)
FORMAT_PACKED_IP(addr_bytes)NET.IP_TO_STRING(addr_bytes)
NVL(expr, null_default)IFNULL(expr, null_default)

For more information about GoogleSQL functions, seeAll functions.

COUNT function comparison

Both legacy SQL and GoogleSQL contain a COUNT function. However, eachfunction behaves differently, depending on the SQL dialect you use.

In legacy SQL,COUNT(DISTINCT x) returns an approximate count. In standardSQL, it returns an exact count. For an approximate count of distinct valuesthat runs faster and requires fewer resources, useAPPROX_COUNT_DISTINCT.

URL function comparison

Both legacy SQL and GoogleSQL contain functions for parsing URLs. In legacySQL, these functions areHOST(url),TLD(url), andDOMAIN(url). In standardSQL, these functions areNET.HOST(url),NET.PUBLIC_SUFFIX(url),andNET.REG_DOMAIN(url).

Improvements over legacy SQL functions

  • GoogleSQL URL functions can parse URLs starting with "//".
  • When the input is not compliant withRFC 3986 or is not a URL(for example, "mailto:?to=&subject=&body="), different rules are applied toparse the input. In particular, GoogleSQL URL functions can parsenon-standard inputs without "//", such as "www.google.com". Forbest results, it is recommended that you ensure that inputs are URLs andcomply with RFC 3986.
  • NET.PUBLIC_SUFFIX returns results without leading dots. For example, itreturns "com" instead of ".com". This complies with the format in thepublic suffix list.
  • NET.PUBLIC_SUFFIX andNET.REG_DOMAIN support uppercase letters andinternationalized domain names.TLD andDOMAIN do not support them (mightreturn unexpected results).

Minor differences on edge cases

  • If the input does not contain any suffix in thepublic suffix list,NET.PUBLIC_SUFFIX andNET.REG_DOMAIN return NULL, whileTLD andDOMAIN return non-NULL values as best effort guesses.
  • If the input contains only a public suffix without a preceding label (forexample, "http://com"),NET.PUBLIC_SUFFIX returns the public suffix, whileTLD returns an empty string. Similarly,NET.REG_DOMAIN returns NULL, whileDOMAIN returns the public suffix.
  • For inputs with IPv6 hosts,NET.HOST does not remove brackets from theresult, as specified byRFC 3986.
  • For inputs with IPv4 hosts,NET.REG_DOMAIN returns NULL, whileDOMAINreturns the first 3 octets.

Examples

In the table below, indicates resultsthat are the same between legacy and GoogleSQL.

URL (description)HOSTNET.HOSTTLDNET.PUBLIC _SUFFIXDOMAINNET.REG_DOMAIN
"//google.com"
(starting with "//")
NULL"google.com"NULL"com"NULL"google.com"
"google.com"
(non-standard; no "//")
NULL"google.com"NULL"com"NULL"google.com"
"http://user:pass@word@x.com"
(non-standard with multiple "@")
"word@x.com""x.com"".com""com""word@x.com""x.com"
"http://foo.com:1:2"
(non-standard with multiple ":")
"foo.com:1""foo.com"".com:1""com"
"http://x.Co.uk"
(upper case letters)
".uk""Co.uk""Co.uk""x.Co.uk"
"http://a.b"
(public suffix not found)
".b"NULL"a.b"NULL
"http://com"
(host contains only a public suffix)
"""com""com"NULL
"http://[::1]"
(IPv6 host; no public suffix)
"::1""[::1]"""NULL"::1"NULL
"http://1.2.3.4"
(IPv4 host; no public suffix)
""NULL"1.2.3"NULL

Differences in repeated field handling

AREPEATED type in legacy SQL is equivalent to anARRAY of that type inGoogleSQL. For example,REPEATED INTEGER is equivalent toARRAY<INT64> inGoogleSQL. The following section discusses some of the differences inoperations on repeated fields between legacy and GoogleSQL.

NULL elements andNULL arrays

GoogleSQL supportsNULL array elements, but raises an error if there is aNULL array element in the query result. If there is aNULL array column inthe query result, GoogleSQL stores it as an empty array.

Selecting nested repeated leaf fields

Using legacy SQL, you can "dot" into a nested repeated field without needing toconsider where the repetition occurs. In GoogleSQL, attempting to "dot" intoa nested repeated field results in an error. For example:

#standardSQLSELECTrepository.url,payload.pages.page_nameFROM`bigquery-public-data.samples.github_nested`LIMIT5;

Attempting to execute this query returns:

Cannotaccessfieldpage_nameonavaluewithtypeARRAY<STRUCT<actionSTRING,html_urlSTRING,page_nameSTRING,...>>

To correct the error and return an array ofpage_names in the result, use anARRAY subquery instead. For example:

#standardSQLSELECTrepository.url,ARRAY(SELECTpage_nameFROMUNNEST(payload.pages))ASpage_namesFROM`bigquery-public-data.samples.github_nested`LIMIT5;

For more information about arrays andARRAY subqueries, seeWorking with arrays.

Filtering repeated fields

Using legacy SQL, you can filter repeated fields directly using aWHEREclause. In GoogleSQL, you can express similar logic with aJOIN commaoperator followed by a filter. For example, consider the following legacy SQLquery:

#legacySQLSELECTpayload.pages.titleFROM[bigquery-public-data:samples.github_nested]WHEREpayload.pages.page_nameIN('db_jobskill','Profession');

This query returns alltitles of pages for which thepage_name is eitherdb_jobskill orProfession. You can express a similar query in GoogleSQLas:

#standardSQLSELECTpage.titleFROM`bigquery-public-data.samples.github_nested`,UNNEST(payload.pages)ASpageWHEREpage.page_nameIN('db_jobskill','Profession');

One difference between the preceding legacy SQL and GoogleSQL queries is thatif you unset theFlatten Results option and execute the legacy SQL query,payload.pages.title isREPEATED in the query result. To achieve thesame semantics in GoogleSQL and return an array for thetitle column, useanARRAY subquery instead:

#standardSQLSELECTtitleFROM(SELECTARRAY(SELECTtitleFROMUNNEST(payload.pages)WHEREpage_nameIN('db_jobskill','Profession'))AStitleFROM`bigquery-public-data.samples.github_nested`)WHEREARRAY_LENGTH(title) >0;

This query creates an array oftitles where thepage_name is either'db_jobskill' or'Profession', then filters any rows where the array did notmatch that condition usingARRAY_LENGTH(title) > 0.

For more information about arrays, seeWorking with arrays.

Structure of selected nested leaf fields

Legacy SQL preserves the structure of nested leaf fields in theSELECT listwhen theFlatten Results option is unset, whereas GoogleSQL does not. Forexample, consider the following legacy SQL query:

#legacySQLSELECTrepository.url,repository.has_downloadsFROM[bigquery-public-data.samples.github_nested]LIMIT5;

This query returnsurl andhas_downloads within a record namedrepositorywhenFlatten Results is unset. Now consider the following GoogleSQL query:

#standardSQLSELECTrepository.url,repository.has_downloadsFROM`bigquery-public-data.samples.github_nested`LIMIT5;

This query returnsurl andhas_downloads as top-level columns; they are notpart of arepository record or struct. To return them as part of a struct, usetheSTRUCT operator:

#standardSQLSELECTSTRUCT(repository.url,repository.has_downloads)ASrepositoryFROM`bigquery-public-data.samples.github_nested`LIMIT5;

Removing repetition withFLATTEN

GoogleSQL does not have aFLATTEN function as in legacy SQL, but you canachieve similar semantics using theJOIN (comma) operator. For example,consider the following legacy SQL query:

#legacySQLSELECTrepository.url,payload.pages.page_nameFROMFLATTEN([bigquery-public-data:samples.github_nested],payload.pages.page_name)LIMIT5;

You can express a similar query in GoogleSQL as follows:

#standardSQLSELECTrepository.url,page.page_nameFROM`bigquery-public-data.samples.github_nested`,UNNEST(payload.pages)ASpageLIMIT5;

Or, equivalently, useJOIN rather than the comma, operator:

#standardSQLSELECTrepository.url,page.page_nameFROM`bigquery-public-data.samples.github_nested`JOINUNNEST(payload.pages)ASpageLIMIT5;

One important difference is that the legacy SQL query returns a row wherepayload.pages.page_name isNULL ifpayload.pages is empty. The standardSQL query, however, does not return a row ifpayload.pages is empty. Toachieve exactly the same semantics, use aLEFT JOIN orLEFT OUTER JOIN. Forexample:

#standardSQLSELECTrepository.url,page.page_nameFROM`bigquery-public-data.samples.github_nested`LEFTJOINUNNEST(payload.pages)ASpageLIMIT5;

For more information about arrays, seeWorking with arrays. For moreinformation aboutUNNEST, see theUNNEST topic.

Filtering rows withOMIT RECORD IF

TheOMIT IF clause from legacy SQL lets you filter rows based on acondition that can apply to repeated fields. In GoogleSQL, you can modelanOMIT IF clause with anEXISTS clause,IN clause, or simple filter.For example, consider the following legacy SQL query:

#legacySQLSELECTrepository.url,FROM[bigquery-public-data:samples.github_nested]OMITRECORDIFEVERY(payload.pages.page_name!='db_jobskill'ANDpayload.pages.page_name!='Profession');

The analogous GoogleSQL query is:

#standardSQLSELECTrepository.urlFROM`bigquery-public-data.samples.github_nested`WHEREEXISTS(SELECT1FROMUNNEST(payload.pages)WHEREpage_name='db_jobskill'ORpage_name='Profession');

Here theEXISTS clause evaluates totrue if there is at least one element ofpayload.pages where the page name is'db_jobskill' or'Profession'.

Alternatively, suppose that the legacy SQL query usesIN:

#legacySQLSELECTrepository.url,FROM[bigquery-public-data:samples.github_nested]OMITRECORDIFNOTSOME(payload.pages.page_nameIN('db_jobskill','Profession'));

In GoogleSQL, you can express the query using anEXISTS clause withIN:

#standardSQLSELECTrepository.urlFROM`bigquery-public-data.samples.github_nested`WHEREEXISTS(SELECT1FROMUNNEST(payload.pages)WHEREpage_nameIN('db_jobskill','Profession'));

Consider the following legacy SQL query that filters records with 80 or fewerpages:

#legacySQLSELECTrepository.url,FROM[bigquery-public-data:samples.github_nested]OMITRECORDIFCOUNT(payload.pages.page_name)<=80;

In this case, you can use a filter withARRAY_LENGTH in GoogleSQL:

#standardSQLSELECTrepository.urlFROM`bigquery-public-data.samples.github_nested`WHEREARRAY_LENGTH(payload.pages) >80;

Note that theARRAY_LENGTH function applies to the repeatedpayload.pagesfield directly rather than the nested fieldpayload.pages.page_name as in thelegacy SQL query.

For more information about arrays andARRAY subqueries, seeWorking with arrays.

Semantic differences

The semantics of some operations differ between legacy and GoogleSQL.

Automatic data type coercions

Both legacy and GoogleSQL support coercions (automatic conversions) betweencertain data types. For example, BigQuery coerces a value of typeINT64 toFLOAT64 if the query passes it to a function that requiresFLOAT64 as input.GoogleSQL does not support the following coercions that legacy SQL supports.Instead, you must use an explicitCAST.

  • INT64 literal toTIMESTAMP. Instead, useTIMESTAMP_MICROS(micros_value).
  • STRING literal toINT64,FLOAT64, orBOOL. Instead, useCAST(str AS INT64),CAST(str AS FLOAT64), orCAST(str AS BOOL).
  • STRING toBYTES. Instead, useCAST(str AS BYTES).

Runtime errors

Some functions in legacy SQL returnNULL for invalid input, potentiallymasking problems in queries or in data. GoogleSQL is generally more strict,and raises an error if an input is invalid.

  • For all mathematical functions and operators, legacy SQL does not check foroverflows. GoogleSQL adds overflow checks, and raises an error if acomputation overflows. This includes the+,-,* operators, theSUM,AVG, andSTDDEV aggregate functions, and others.
  • GoogleSQL raises an error upon division by zero, whereas legacy SQL returnsNULL. To returnNULL for division by zero in GoogleSQL, useSAFE_DIVIDE.
  • GoogleSQL raises an error forCASTs where the input format is invalid orout of range for the target type, whereas legacy SQL returnsNULL. To avoidraising an error for an invalid cast in GoogleSQL, useSAFE_CAST.

Nested repeated results

Queries executed using GoogleSQL preserve any nesting and repetition of thecolumns in the result, and theFlatten Results option has no effect. Toreturn top-level columns for nested fields, use the.* operator on structcolumns. For example:

#standardSQLSELECTrepository.*FROM`bigquery-public-data.samples.github_nested`LIMIT5;

To return top-level columns for repeated nested fields (ARRAYs ofSTRUCTs),use aJOIN to take the cross product of the table's rows and the elements ofthe repeated nested field. For example:

#standardSQLSELECTrepository.url,page.*FROM`bigquery-public-data.samples.github_nested`JOINUNNEST(payload.pages)ASpageLIMIT5;

For more information about arrays andARRAY subqueries, seeWorking with arrays.

NOT IN conditions and NULL

Legacy SQL does not comply with the SQL standard in its handling ofNULL withNOT IN conditions, whereas GoogleSQL does. Consider the following legacySQL query, which finds the number of words that don't appear in the GitHubsample table as locations:

#legacySQLSELECTCOUNT(*)FROM[bigquery-public-data.samples.shakespeare]WHEREwordNOTIN(SELECTactor_attributes.locationFROM[bigquery-public-data.samples.github_nested]);

This query returns 163,716 as the count, indicating that there are 163,716 wordsthat don't appear as locations in the GitHub table. Now consider the followingGoogleSQL query:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.samples.shakespeare`WHEREwordNOTIN(SELECTactor_attributes.locationFROM`bigquery-public-data.samples.github_nested`);

This query returns 0 as the count. The difference is due to the semantics ofNOT IN with GoogleSQL, which returnsNULL if any value on the right handside isNULL. To achieve the same results as with the legacy SQL query, use aWHERE clause to exclude theNULL values:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.samples.shakespeare`WHEREwordNOTIN(SELECTactor_attributes.locationFROM`bigquery-public-data.samples.github_nested`WHEREactor_attributes.locationISNOTNULL);

This query returns 163,716 as the count. Alternatively, use aNOT EXISTScondition:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.samples.shakespeare`AStWHERENOTEXISTS(SELECT1FROM`bigquery-public-data.samples.github_nested`WHEREt.word=actor_attributes.location);

This query also returns 163,716 as the count. For further reading, see thecomparison operatorssection of the documentation, which explains the semantics ofIN,NOT IN,EXISTS, and other comparison operators.

Differences in user-defined JavaScript functions

TheUser-defined functionstopic documents how to use JavaScript user-defined functions with GoogleSQL.This section explains some of the key differences between user-defined functionsin legacy and GoogleSQL.

Functions in the query text

With GoogleSQL, you useCREATE TEMPORARY FUNCTION as part of the query bodyrather than specifying user-defined functions separately. Examples of definingfunctions separately include using the UDF Editor in the Google Cloud consoleor using the--udf_resource flag inthe bq command-line tool.

Consider the following GoogleSQL query:

#standardSQL-- Computes the harmonic mean of the elements in 'arr'.-- The harmonic mean of x_1, x_2, ..., x_n can be expressed as:--   n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n))CREATETEMPORARYFUNCTIONHarmonicMean(arrARRAY<FLOAT64>)RETURNSFLOAT64LANGUAGEjsAS"""var sum_of_reciprocals = 0;for (var i = 0; i < arr.length; ++i) {  sum_of_reciprocals += 1 / arr[i];}return arr.length / sum_of_reciprocals;""";WITHTAS(SELECTGENERATE_ARRAY(1.0,x*4,x)ASarrFROMUNNEST([1,2,3,4,5])ASx)SELECTarr,HarmonicMean(arr)ASh_meanFROMT;

This query defines a JavaScript function namedHarmonicMean and then appliesit to the array columnarr fromT.

For more information about user-defined functions, see theUser-defined functions topic.

Functions operate on values rather than rows

In legacy SQL, JavaScript functions operate on rows from a table. In standardSQL, as in the example above, JavaScript functions operate on values. To pass arow value to a JavaScript function using GoogleSQL, define a function thattakes a struct of the same row type as the table. For example:

#standardSQL-- Takes a struct of x, y, and z and returns a struct with a new field foo.CREATETEMPORARYFUNCTIONAddField(sSTRUCT<xFLOAT64,yBOOL,zSTRING>)RETURNSSTRUCT<xFLOAT64,yBOOL,zSTRING,fooSTRING>LANGUAGEjsAS"""var new_struct = new Object();new_struct.x = s.x;new_struct.y = s.y;new_struct.z = s.z;if (s.y) {  new_struct.foo = 'bar';} else {  new_struct.foo = 'baz';}return new_struct;""";WITHTAS(SELECTx,MOD(off,2)=0ASy,CAST(xASSTRING)ASzFROMUNNEST([5.0,4.0,3.0,2.0,1.0])ASxWITHOFFSEToff)SELECTAddField(t).*FROMTASt;

This query defines a JavaScript function that takes a struct with the same rowtype asT and creates a new struct with an additional field namedfoo. TheSELECT statement passes the rowt as input to the function and uses.* toreturn the fields of the resulting struct in the output.

Migrating legacy SQL table wildcard functions

In legacy SQL, you can use the followingtable wildcard functionsto query multiple tables.

  • TABLE_DATE_RANGE() andTABLE_DATE_RANGE_STRICT()
  • TABLE_QUERY()

The TABLE_DATE_RANGE() functions

The legacy SQLTABLE_DATE_RANGE() functions work on tables that conform to aspecific naming scheme:<prefix>YYYYMMDD, where the<prefix> represents thefirst part of a table name andYYYYMMDD represents the date associated withthat table's data.

For example, the following legacy SQL query finds the average temperature froma set of daily tables that contain Seattle area weather data:

#legacySQLSELECTROUND(AVG(TemperatureF),1)ASAVG_TEMP_FFROMTABLE_DATE_RANGE([mydataset.sea_weather_],TIMESTAMP("2016-05-01"),TIMESTAMP("2016-05-09"))

In GoogleSQL, an equivalent query uses a table wildcard and theBETWEENclause.

#standardSQLSELECTROUND(AVG(TemperatureF),1)ASAVG_TEMP_FFROM`mydataset.sea_weather_*`WHERE_TABLE_SUFFIXBETWEEN'20160501'AND'20160509'

The TABLE_QUERY() function

The legacy SQLTABLE_QUERY() function enables you to find table names basedon patterns. When migrating aTABLE_QUERY() function to GoogleSQL, whichdoes not support theTABLE_QUERY() function, you can instead filter usingthe_TABLE_SUFFIX pseudocolumn. Keep the following differences in mind whenmigrating:

  • In legacy SQL, you place theTABLE_QUERY() function in theFROM clause,whereas in GoogleSQL, you filter using the_TABLE_SUFFIX pseudocolumn intheWHERE clause.

  • In legacy SQL, theTABLE_QUERY() function operates on the entire table name(ortable_id), whereas in GoogleSQL, the_TABLE_SUFFIX pseudocolumncontains part or all of the table name, depending on how you use the wildcardcharacter.

Filter in the WHERE clause

When migrating from legacy SQL to GoogleSQL, move the filter to theWHEREclause. For example, the following query finds the maximum temperatures acrossall years that end in the number0:

#legacySQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROMTABLE_QUERY([bigquery-public-data:noaa_gsod],'REGEXP_MATCH(table_id, r"0$")')WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceORDERBYmaxDESC

In GoogleSQL, an equivalent query uses a table wildcard and places theregular expression function,REGEXP_CONTAINS(), in theWHERE clause:

#standardSQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROM`bigquery-public-data.noaa_gsod.gsod*`WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceANDREGEXP_CONTAINS(_TABLE_SUFFIX,r"0$")ORDERBYmaxDESC

Differences between table_id and _TABLE_SUFFIX

In the legacy SQLTABLE_QUERY(dataset, expr) function, the second parameteris an expression that operates over the entire table name, using the valuetable_id. When migrating to GoogleSQL, the filter that you create in theWHERE clause operates on the value of_TABLE_SUFFIX, which can include partor all of the table name, depending on your use of the wildcard character.

For example, the following legacy SQL query uses the entire table name in aregular expression to find the maximum temperatures across all years that endin the number0:

#legacySQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROMTABLE_QUERY([bigquery-public-data:noaa_gsod],'REGEXP_MATCH(table_id, r"gsod\d{3}0")')WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceORDERBYmaxDESC

In GoogleSQL, an equivalent query can use the entire table name or only apart of the table name. You can use an empty prefix in GoogleSQL so thatyour filter operates over the entire table name:

# Empty prefixFROM`bigquery-public-data.noaa_gsod.*`

However, longer prefixes perform better than empty prefixes, so the followingexample uses a longer prefix, which means that the value of_TABLE_SUFFIXis only part of the table name.

#standardSQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROM`bigquery-public-data.noaa_gsod.gsod*`WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceANDREGEXP_CONTAINS(_TABLE_SUFFIX,r"\d{3}0")ORDERBYmaxDESC

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-24 UTC.