Migrating to GoogleSQL

BigQuery supports two SQL dialects:GoogleSQL andlegacy SQL. This document explains thedifferences between the two dialects, includingsyntax,functions, andsemantics,and gives examples of some of thehighlights of GoogleSQL.

Comparison of legacy and GoogleSQL

When initially released, BigQuery ran queries using anon-GoogleSQL dialect known asBigQuerySQL. With the launch ofBigQuery 2.0, BigQuery released support forGoogleSQL, andrenamed BigQuery SQL to legacy SQL. GoogleSQL is thepreferred SQL dialect for querying data stored in BigQuery.

Do I have to migrate to GoogleSQL?

We recommend migrating from legacy SQL to GoogleSQL, but it's not required. Forexample, suppose that you execute many queries that use legacy SQL, but you wantto take advantage of a GoogleSQL feature for a new query. You can create newqueries using GoogleSQL that run alongside queries using legacy SQL.

Enabling GoogleSQL

You have a choice of whether to use legacy or GoogleSQL when you run aquery. For information about switching between SQL dialects, seeBigQuery SQL dialects.

Advantages of GoogleSQL

GoogleSQL complies with the SQL 2011 standard, and has extensions thatsupport querying nested and repeated data. It has several advantages over legacySQL, including:

Composability usingWITH clauses andSQL functions
Subqueries in theSELECT list andWHERE clause
Correlated subqueries
ARRAY andSTRUCT data types
Inserts, updates, and deletes
COUNT(DISTINCT <expr>) is exact and scalable, providing the accuracy ofEXACT_COUNT_DISTINCT without its limitations
Automatic predicate push-down throughJOINs
ComplexJOIN predicates, including arbitrary expressions

For examples that demonstrate some of these features, seeGoogleSQL highlights.

GoogleSQL highlights

This section discusses some of the highlights of GoogleSQL compared tolegacy SQL.

Composability using`WITH` clauses

Some of the GoogleSQL examples on this page make use of aWITH clause,which enables extraction or reuse of named subqueries. For example:

#standardSQLWITHTAS(SELECTxFROMUNNEST([1,2,3,4])ASx)SELECTx/(SELECTSUM(x)FROMT)ASweighted_xFROMT;

This query defines a named subqueryT that containsx values of 1, 2, 3,and 4. It selectsx values fromT and divides them by the sum of allxvalues inT. This query is equivalent to a query where the contents ofTare inline:

#standardSQLSELECTx/(SELECTSUM(x)FROM(SELECTxFROMUNNEST([1,2,3,4])ASx))ASweighted_xFROM(SELECTxFROMUNNEST([1,2,3,4])ASx);

As another example, consider this query, which uses multiple named subqueries:

#standardSQLWITHTAS(SELECTxFROMUNNEST([1,2,3,4])ASx),TPlusOneAS(SELECTx+1ASyFROMT),TPlusOneTimesTwoAS(SELECTy*2ASzFROMTPlusOne)SELECTzFROMTPlusOneTimesTwo;

This query defines a sequence of transformations of the original data, followedby aSELECT statement overTPlusOneTimesTwo. This query is equivalent to thefollowing query, which inlines the computations:

#standardSQLSELECT(x+1)*2ASzFROM(SELECTxFROMUNNEST([1,2,3,4])ASx);

For more information, seeWITH clause.

Composability using SQL functions

GoogleSQL supportsuser-defined SQL functions.You can use user-defined SQL functions to define common expressions and thenreference them from the query. For example:

#standardSQL-- Computes the harmonic mean of the elements in 'arr'.-- The harmonic mean of x_1, x_2, ..., x_n can be expressed as:--   n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n))CREATETEMPORARYFUNCTIONHarmonicMean(arrARRAY<FLOAT64>)AS(ARRAY_LENGTH(arr)/(SELECTSUM(1/x)FROMUNNEST(arr)ASx));WITHTAS(SELECTGENERATE_ARRAY(1.0,x*4,x)ASarrFROMUNNEST([1,2,3,4,5])ASx)SELECTarr,HarmonicMean(arr)ASh_meanFROMT;

This query defines a SQL function namedHarmonicMean and then applies it tothe array columnarr fromT.

Subqueries in more places

GoogleSQL supports subqueries in theSELECT list,WHERE clause, andanywhere else in the query that expects an expression. For example, consider thefollowing GoogleSQL query that computes the fraction of warm days in Seattlein 2015:

#standardSQLWITHSeattleWeatherAS(SELECT*FROM`bigquery-public-data.noaa_gsod.gsod2015`WHEREstn='994014')SELECTCOUNTIF(max>=70)/(SELECTCOUNT(*)FROMSeattleWeather)ASwarm_days_fractionFROMSeattleWeather;

The Seattle weather station has an ID of'994014'. The query computes thenumber of warm days based on those where the temperature reached 70 degreesFahrenheit, or approximately 21 degrees Celsius, divided by the total number ofrecorded days for that station in 2015.

Correlated subqueries

In GoogleSQL, subqueries can reference correlated columns; that is, columnsthat originate from the outer query. For example, consider the followingGoogleSQL query:

#standardSQLWITHWashingtonStationsAS(SELECTweather.stnASstation_id,ANY_VALUE(station.name)ASnameFROM`bigquery-public-data.noaa_gsod.stations`ASstationINNERJOIN`bigquery-public-data.noaa_gsod.gsod2015`ASweatherONstation.usaf=weather.stnWHEREstation.state='WA'ANDstation.usaf!='999999'GROUPBYstation_id)SELECTwashington_stations.name,(SELECTCOUNT(*)FROM`bigquery-public-data.noaa_gsod.gsod2015`ASweatherWHEREwashington_stations.station_id=weather.stnANDmax>=70)ASwarm_daysFROMWashingtonStationsASwashington_stationsORDERBYwarm_daysDESC;

This query computes the names of weather stations in Washington state and thenumber of days in 2015 that the temperature reached 70 degrees Fahrenheit, orapproximately 21 degrees Celsius. Notice that there is a subquery in theSELECT list, and that the subquery referenceswashington_stations.station_idfrom the outer scope, namelyFROM WashingtonStations AS washington_stations.

Arrays and structs

ARRAY andSTRUCT are powerful concepts in GoogleSQL. As an example thatuses both, consider the following query, which computes the top two articlesfor each day in the HackerNews dataset:

#standardSQLWITHTitlesAndScoresAS(SELECTARRAY_AGG(STRUCT(title,score))AStitles,EXTRACT(DATEFROMtime_ts)ASdateFROM`bigquery-public-data.hacker_news.stories`WHEREscoreISNOTNULLANDtitleISNOTNULLGROUPBYdate)SELECTdate,ARRAY(SELECTASSTRUCTtitle,scoreFROMUNNEST(titles)ORDERBYscoreDESCLIMIT2)AStop_articlesFROMTitlesAndScoresORDERBYdateDESC;

TheWITH clause definesTitlesAndScores, which contains two columns. Thefirst is an array of structs, where one field is an article title and the secondis a score. TheARRAY_AGG expression returns an array of these structs foreach day.

TheSELECT statement following theWITH clause uses anARRAY subquery tosort and return the top two articles within each array in accordance with thescore, then returns the results in descending order by date.

For more information about arrays andARRAY subqueries, seeWorking with arrays. See also thereferences forarraysandstructs.

Data type differences

Legacy SQL types have an equivalent in GoogleSQL. Insome cases, the type has a different name. The following table lists each legacySQL data type and its GoogleSQL equivalent.

Legacy SQL	GoogleSQL	Notes
`BOOLEAN`	`BOOLEAN`
`INTEGER`	`INT64`
`FLOAT`	`FLOAT64`
`NUMERIC`	`NUMERIC`	Legacy SQL has limited support for`NUMERIC`
`BIGNUMERIC`	`BIGNUMERIC`	Legacy SQL has limited support for`BIGNUMERIC`
`STRING`	`STRING`
`BYTES`	`BYTES`
`RECORD`	`STRUCT`
`REPEATED`	`ARRAY`
`TIMESTAMP`	`TIMESTAMP`	See`TIMESTAMP` differences
`DATE`	`DATE`	Legacy SQL has limited support for`DATE`
`TIME`	`TIME`	Legacy SQL has limited support for`TIME`
`DATETIME`	`DATETIME`	Legacy SQL has limited support for`DATETIME`

For more information see:

`TIMESTAMP` type differences

GoogleSQL has astricter range of validTIMESTAMP valuesthan legacy SQL does. In GoogleSQL, validTIMESTAMP values are in therange of0001-01-01 00:00:00.000000 to9999-12-31 23:59:59.999999. Forexample, you can select the minimum and maximumTIMESTAMP values usingGoogleSQL:

#standardSQLSELECTmin_timestamp,max_timestamp,UNIX_MICROS(min_timestamp)ASmin_unix_micros,UNIX_MICROS(max_timestamp)ASmax_unix_microsFROM(SELECTTIMESTAMP'0001-01-01 00:00:00.000000'ASmin_timestamp,TIMESTAMP'9999-12-31 23:59:59.999999'ASmax_timestamp);

This query returns-62135596800000000 asmin_unix_micros and253402300799999999 asmax_unix_micros.

If you select a column that contains timestamp values outside of thisrange, you receive an error:

#standardSQLSELECTtimestamp_column_with_invalid_valuesFROMMyTableWithInvalidTimestamps;

This query returns the following error:

Cannotreturnaninvalidtimestampvalueof-8446744073709551617microsecondsrelativetotheUnixepoch.Therangeofvalidtimestampvaluesis[0001-01-100:00:00,9999-12-3123:59:59.999999]

To correct the error, one option is to define and use auser-defined functionto filter the invalid timestamps:

#standardSQLCREATETEMPFUNCTIONTimestampIsValid(tTIMESTAMP)AS(t>=TIMESTAMP('0001-01-01 00:00:00')ANDt<=TIMESTAMP('9999-12-31 23:59:59.999999'));SELECTtimestamp_column_with_invalid_valuesFROMMyTableWithInvalidTimestampsWHERETimestampIsValid(timestamp_column_with_invalid_values);

Another option to correct the error is to use theSAFE_CASTfunction with the timestamp column. For example:

#standardSQLSELECTSAFE_CAST(timestamp_column_with_invalid_valuesASSTRING)AStimestamp_stringFROMMyTableWithInvalidTimestamps;

This query returnsNULL rather than a timestamp string for invalidtimestamp values.

Automatic data type coercions

Both legacy and GoogleSQL support coercions (automatic conversions) betweencertain data types. For example, BigQuery coerces a value of typeINT64 toFLOAT64 if the query passes it to a function that requiresFLOAT64 as input.

The following legacy SQL coercions are not supported in GoogleSQL and need to be explicitly cast to the correct type:

Coercion	Translation
`BOOLEAN` to`INT64` or`FLOAT64`	Use`SAFE_CAST(bool AS INT64)` or`SAFE_CAST(bool AS FLOAT64)`
`INT64` to`TIMESTAMP`	Use`TIMESTAMP_MICROS(micros_value)`
`STRING` to`BYTES`	Use`SAFE_CAST(str AS BYTES)`
`STRING` to`INT64`,`FLOAT64`, or`BOOL`	Mostly supported by GoogleSQL. For corner cases use`SAFE_CAST(str AS INT64)`,`SAFE_CAST(str AS FLOAT64)`, or`SAFE_CAST(str AS BOOL)`
`STRING` to`TIMESTAMP`	Mostly supported by GoogleSQL. For corner cases use`TIMESTAMP(str)` or`SAFE_CAST(str AS TIMESTAMP)`

For example, the following legacy SQL query uses implicit coercions:

#legacySQLSELECT1+trueasboolean_int_coercion,TIMESTAMP(1234567890)asinteger_timestamp_coercion;

In GoogleSQL, this query is invalid. To achieve the same result, you must use explicit casting:

#standardSQLSELECT1+SAFE_CAST(trueASINT64)asboolean_coercion,TIMESTAMP_MICROS(1234567890)asinteger_timestamp_coercion;

Syntax differences

While GoogleSQL and legacy SQL syntaxes are similar, there are some crucial differences.

Escaping reserved keywords and invalid identifiers

In legacy SQL, you escape reserved keywords and identifiers that containinvalid characters such as a space or hyphen- using square brackets[].In GoogleSQL, you escape such keywords and identifiers using backticks`. For example:

#standardSQLSELECTword,SUM(word_count)ASword_countFROM`bigquery-public-data.samples.shakespeare`WHEREwordIN('me','I','you')GROUPBYword;

Legacy SQL allows reserved keywords in some places that GoogleSQL does not.For example, the following query fails due to aSyntax error using standardSQL:

#standardSQLSELECTCOUNT(*)ASrowsFROM`bigquery-public-data.samples.shakespeare`;

To fix the error, escape the aliasrows using backticks:

#standardSQLSELECTCOUNT(*)AS`rows`FROM`bigquery-public-data.samples.shakespeare`;

The following is a list of keywords allowed in legacy SQL, but not in GoogleSQL:

ALL
AND
ANY
ARRAY
ASSERT_ROWS_MODIFIED
AT
COLLATE
CURRENT
DEFAULT
DESC
END
ENUM

ESCAPE
EXCEPT
EXCLUDE
EXTRACT
FETCH
FOR
GROUP
GROUPING
GROUPS
IF
INTERVAL
IS

LATERAL
NATURAL
NEW
NO
NULLS
OF
ORDER
PROTO
QUALIFY
RANGE
RECURSIVE

RESPECT
ROLLUP
ROWS
SOME
STRUCT
TABLESAMPLE
TO
TREAT
UNNEST
WHEN
WINDOW

For a more comprehensive list of reserved keywords and what constitutes valid identifiers, see theReserved keywords section inLexical structure.

Project-qualified table names

In legacy SQL, to query a table with a project-qualified name, you can use either a colon: or a period..In GoogleSQL however, you must use only periods..

Example legacy SQL query:

#legacySQLSELECTwordFROM[bigquery-public-data:samples.shakespeare]LIMIT1;

The GoogleSQL equivalent is:

#standardSQLSELECTwordFROM`bigquery-public-data.samples.shakespeare`LIMIT1;

If your project name includes a domain, such asexample.com:myproject, you useexample.com:myproject as the project name, including the:.

Table decorators

Table decorators are used to run a query on a subset of data.

Time decorator (snapshot decorator)

You can achieve the semantics of time decorators (formerly known assnapshotdecorators) by using theFOR SYSTEM_TIME AS OFclause, which references the historical version of a table at a specifiedtimestamp. For more information, seeAccessing historical data using time travel.

Range decorator

There is no exact equivalent to range decorators in GoogleSQL. You canachieve similar semantics by creating a time-partitioned table and using apartition filter when querying data. For more information, seeQuerying partitioned tables.

Another option is to create date-sharded tables and filter on the_TABLE_SUFFIX pseudocolumn. For more information, seeGoogleSQL wildcard tables.

Partition decorator

The equivalent of a partition decorator in GoogleSQL is a filter on the partitioning column. For more information, seeQuerying partitioned tables.

For example, consider a legacy SQL query using atable partitioned over the_PARTITIONTIME pseudocolumn:

#legacySQLSELECT*FROMdataset.table$20160501;

The GoogleSQL equivalent is:

#standardSQLSELECT*FROMdataset.tableWHERE_PARTITIONTIME=TIMESTAMP('2016-05-01');

Another option is to filter on the_TABLE_SUFFIX pseudocolumn. For more information, seeGoogleSQL wildcard tables.

Table wildcard functions

In legacy SQL, you can use the followingtable wildcard functionsto query multiple tables.

TABLE_DATE_RANGE
TABLE_DATE_RANGE_STRICT
TABLE_QUERY

These functions are not supported in GoogleSQL, but they can be migrated using a filter on_TABLE_SUFFIX pseudocolumn.

For example, consider the following legacy SQL query, which counts the number of rows across 2010 and 2011in the National Oceanic and Atmospheric Administration GSOD (global summary ofthe day) tables:

#legacySQLSELECTCOUNT(*)FROMTABLE_QUERY([bigquery-public-data:noaa_gsod],'table_id IN ("gsod2010", "gsod2011")');

An equivalent query using GoogleSQL is:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.noaa_gsod.*`WHERE_TABLE_SUFFIXIN("gsod2010","gsod2011");

For more information, including examples of all wildcard functions, seeMigrating table wildcard functions.

Comma operator with tables

In legacy SQL, the comma operator, has the non-standard meaning ofUNION ALL BY NAME when applied to tables. In GoogleSQL, the comma operator has thestandard meaning ofJOIN.

For example, consider the following legacy SQL query:

#legacySQLSELECTx,yFROM(SELECT1ASx,"foo"ASy),(SELECT2ASx,"bar"ASy);

This is equivalent to the GoogleSQL query:

#standardSQLSELECTx,yFROM(SELECT1ASx,"foo"ASyUNIONALLBYNAMESELECT2ASx,"bar"ASy);

Legacy SQL associates columns by name instead of by position. To get the same behavior in GoogleSQL, useUNION ALL BY NAME. For example, the following query shows a translation where the column names and order are swapped:

#legacySQLSELECT*FROM(SELECT1ASA,2asB),(SELECT3ASB,4asA);

The GoogleSQL equivalent will look like:

#standardSQLSELECT*FROM(SELECT1ASA,2asB)UNIONALLBYNAME(SELECT3ASB,4asA);

Where the result in both cases will be:

+---+---+|A|B|+---+---+|1|2||4|3|+---+---+

Logical views

You cannot query a logical view defined with legacy SQL using GoogleSQL, andconversely, you cannot query a logical view defined with GoogleSQL using legacy SQL due to differences in syntax and semantics between the dialects.Instead, you would need to create a new view that uses GoogleSQL -- possiblyunder a different name -- to replace a view that uses legacy SQL.

As an example, suppose that viewV is defined using legacy SQL as:

#legacySQLSELECT*,UTC_USEC_TO_DAY(timestamp_col)ASdayFROMMyTable;

Suppose that viewW is defined using legacy SQL as:

#legacySQLSELECTuser,action,dayFROMV;

Suppose that you execute the following legacy SQL query daily, but you want tomigrate it to use GoogleSQL instead:

#legacySQLSELECTEXACT_COUNT_DISTINCT(user),action,dayFROMWGROUPBYaction,day;

One possible migration path is to create new views using different names. Thesteps involved are:

Create a view namedV2 using GoogleSQL with the following contents:

#standardSQLSELECT*,EXTRACT(DAYFROMtimestamp_col)ASdayFROMMyTable;

Create a view namedW2 using GoogleSQL with the following contents:

#standardSQLSELECTuser,action,dayFROMV2;

Change your query that executes daily to use GoogleSQL and refer toW2instead:

#standardSQLSELECTCOUNT(DISTINCTuser),action,dayFROMW2GROUPBYaction,day;

Another option is to delete viewsV andW, then recreate them using standardSQL under the same names. With this option, you would need to migrate all ofyour queries that referenceV orW to use GoogleSQL at the same time,however.

Function comparison

The following is a partial list of legacy SQL functions and their GoogleSQLequivalents.

Legacy SQL	GoogleSQL	Notes
`INTEGER(x)`	`SAFE_CAST(x AS INT64)`
`CAST(x AS INTEGER)`	`SAFE_CAST(x AS INT64)`
`DATEDIFF(t1, t2)`	`DATE_DIFF(DATE(t1), DATE(t2), DAY)`
`NOW()`	`UNIX_MICROS(CURRENT_TIMESTAMP())`
`STRFTIME_UTC_USEC(ts_usec, fmt)`	`FORMAT_TIMESTAMP(fmt, TIMESTAMP_MICROS(ts_usec))`
`UTC_USEC_TO_DAY(ts_usec)`	`UNIX_MICROS(TIMESTAMP_TRUNC(TIMESTAMP_MICROS(ts_usec), DAY))`
`UTC_USEC_TO_WEEK(ts_usec, day_of_week)`	`UNIX_MICROS(TIMESTAMP_ADD(TIMESTAMP_TRUNC(TIMESTAMP_MICROS(ts_usec), WEEK),INTERVAL day_of_week DAY))`
`REGEXP_MATCH(s, pattern)`	`REGEXP_CONTAINS(s, pattern)`
`IS_NULL(x)`	`x IS NULL`
`LEFT(s, len)`	`SUBSTR(s, 0, len)`
`RIGHT(s, len)`	`SUBSTR(s, -len)`
`s CONTAINS "foo"`	`STRPOS(s, "foo") > 0` or`s LIKE '%foo%'`
`INSTR(s, "foo")`	`STRPOS(s, "foo")`
`x % y`	`MOD(x, y)`
`NEST(x)`	`ARRAY_AGG(x)`
`GROUP_CONCAT_UNQUOTED(s, sep)`	`STRING_AGG(s, sep)`
`SOME(x)`	`IFNULL(LOGICAL_OR(x), false)`
`EVERY(x)`	`IFNULL(LOGICAL_AND(x), true)`
`COUNT(DISTINCT x)`	`APPROX_COUNT_DISTINCT(x)`	seenotes below
`EXACT_COUNT_DISTINCT(x)`	`COUNT(DISTINCT x)`	seenotes below
`QUANTILES(x, boundaries)`	`APPROX_QUANTILES(x, boundaries-1)`	output needs to beunnested
`TOP(x, num), COUNT(*)`	`APPROX_TOP_COUNT(x, num)`
`NTH(index, arr) WITHIN RECORD`	`arr[SAFE_ORDINAL(index)]`
`COUNT(arr) WITHIN RECORD`	`ARRAY_LENGTH(arr)`
`HOST(url)`	`NET.HOST(url)`	seedifferences below
`TLD(url)`	`NET.PUBLIC_SUFFIX(url)`	seedifferences below
`DOMAIN(url)`	`NET.REG_DOMAIN(url)`	seedifferences below
`PARSE_IP(addr_string)`	`NET.IPV4_TO_INT64(NET.IP_FROM_STRING(addr_string))`
`FORMAT_IP(addr_int64)`	`NET.IP_TO_STRING(NET.IPV4_FROM_INT64(addr_int64 & 0xFFFFFFFF))`
`PARSE_PACKED_IP(addr_string)`	`NET.IP_FROM_STRING(addr_string)`
`FORMAT_PACKED_IP(addr_bytes)`	`NET.IP_TO_STRING(addr_bytes)`
`NVL(expr, null_default)`	`IFNULL(expr, null_default)`

For more information about GoogleSQL functions, seeAll functions.

`COUNT` function comparison

Both legacy SQL and GoogleSQL contain a COUNT function. However, eachfunction behaves differently, depending on the SQL dialect you use.

In legacy SQL,COUNT(DISTINCT x) returns an approximate count. In standardSQL, it returns an exact count. For an approximate count of distinct valuesthat runs faster and requires fewer resources, useAPPROX_COUNT_DISTINCT.

URL function comparison

Both legacy SQL and GoogleSQL contain functions for parsing URLs. In legacySQL, these functions areHOST(url),TLD(url), andDOMAIN(url). In standardSQL, these functions areNET.HOST(url),NET.PUBLIC_SUFFIX(url),andNET.REG_DOMAIN(url).

Improvements in GoogleSQL functions

GoogleSQL URL functions can parse URLs starting with "//".
When the input is not compliant withRFC 3986 or is not a URL(for example, "mailto:?to=&subject=&body="), different rules are applied toparse the input. In particular, GoogleSQL URL functions can parsenon-standard inputs without "//", such as "www.google.com". Forbest results, it is recommended that you ensure that inputs are URLs andcomply with RFC 3986.
NET.PUBLIC_SUFFIX returns results without leading dots. For example, itreturns "com" instead of ".com". This complies with the format in thepublic suffix list.
NET.PUBLIC_SUFFIX andNET.REG_DOMAIN support uppercase letters andinternationalized domain names.TLD andDOMAIN don't support them (mightreturn unexpected results).

Minor differences on edge cases

If the input does not contain any suffix in thepublic suffix list,NET.PUBLIC_SUFFIX andNET.REG_DOMAIN return NULL, whileTLD andDOMAIN return non-NULL values as best effort guesses.
If the input contains only a public suffix without a preceding label (forexample, "http://com"),NET.PUBLIC_SUFFIX returns the public suffix, whileTLD returns an empty string. Similarly,NET.REG_DOMAIN returns NULL, whileDOMAIN returns the public suffix.
For inputs with IPv6 hosts,NET.HOST does not remove brackets from theresult, as specified byRFC 3986.
For inputs with IPv4 hosts,NET.REG_DOMAIN returns NULL, whileDOMAINreturns the first 3 octets.

Examples

In the following table, indicates resultsthat are the same between legacy and GoogleSQL.

URL (description)	HOST	NET.HOST	TLD	NET.PUBLIC _SUFFIX	DOMAIN	NET.REG_DOMAIN
"//google.com" (starting with "//")	NULL	"google.com"	NULL	"com"	NULL	"google.com"
"google.com" (non-standard; no "//")	NULL	"google.com"	NULL	"com"	NULL	"google.com"
"http://user:pass@word@x.com" (non-standard with multiple "@")	"word@x.com"	"x.com"	".com"	"com"	"word@x.com"	"x.com"
"http://foo.com:1:2" (non-standard with multiple ":")	"foo.com:1"	"foo.com"	".com:1"	"com"
"http://x.Co.uk" (upper case letters)			".uk"	"Co.uk"	"Co.uk"	"x.Co.uk"
"http://a.b" (public suffix not found)			".b"	NULL	"a.b"	NULL
"http://com" (host contains only a public suffix)			""	"com"	"com"	NULL
"http://[::1]" (IPv6 host; no public suffix)	"::1"	"[::1]"	""	NULL	"::1"	NULL
"http://1.2.3.4" (IPv4 host; no public suffix)			""	NULL	"1.2.3"	NULL

Differences in repeated field handling

AREPEATED type in legacy SQL is equivalent to anARRAY of that type inGoogleSQL. For example,REPEATED INTEGER is equivalent toARRAY<INT64> inGoogleSQL. The following section discusses some of the differences inoperations on repeated fields between legacy and GoogleSQL.

`NULL` elements and`NULL` arrays

GoogleSQL supportsNULL array elements, but raises an error if there is aNULL array element in the query result. If there is aNULL array column inthe query result, GoogleSQL stores it as an empty array.

Selecting nested repeated leaf fields

Using legacy SQL, you can "dot" into a nested repeated field without needing toconsider where the repetition occurs. In GoogleSQL, attempting to "dot" intoa nested repeated field results in an error. For example:

#standardSQLSELECTrepository.url,payload.pages.page_nameFROM`bigquery-public-data.samples.github_nested`LIMIT5;

Attempting to execute this query returns:

Cannot access field page_name on a value with typeARRAY<STRUCT<action STRING, html_url STRING, page_name STRING, ...>>

To correct the error and return an array ofpage_names in the result, use anARRAY subquery instead. For example:

#standardSQLSELECTrepository.url,ARRAY(SELECTpage_nameFROMUNNEST(payload.pages))ASpage_namesFROM`bigquery-public-data.samples.github_nested`LIMIT5;

For more information about arrays andARRAY subqueries, seeWorking with arrays.

Filtering repeated fields

Using legacy SQL, you can filter repeated fields directly using aWHEREclause. In GoogleSQL, you can express similar logic with aJOIN commaoperator followed by a filter. For example, consider the following legacy SQLquery:

#legacySQLSELECTpayload.pages.titleFROM[bigquery-public-data:samples.github_nested]WHEREpayload.pages.page_nameIN('db_jobskill','Profession');

This query returns alltitles of pages for which thepage_name is eitherdb_jobskill orProfession. You can express a similar query in GoogleSQLas:

#standardSQLSELECTpage.titleFROM`bigquery-public-data.samples.github_nested`,UNNEST(payload.pages)ASpageWHEREpage.page_nameIN('db_jobskill','Profession');

One difference between the preceding legacy SQL and GoogleSQL queries is thatif you unset theFlatten Results option and execute the legacy SQL query,payload.pages.title isREPEATED in the query result. To achieve thesame semantics in GoogleSQL and return an array for thetitle column, useanARRAY subquery instead:

#standardSQLSELECTtitleFROM(SELECTARRAY(SELECTtitleFROMUNNEST(payload.pages)WHEREpage_nameIN('db_jobskill','Profession'))AStitleFROM`bigquery-public-data.samples.github_nested`)WHEREARRAY_LENGTH(title) >0;

This query creates an array oftitles where thepage_name is either'db_jobskill' or'Profession', then filters any rows where the array did notmatch that condition usingARRAY_LENGTH(title) > 0.

For more information about arrays, seeWorking with arrays.

Structure of selected nested leaf fields

Legacy SQL preserves the structure of nested leaf fields in theSELECT listwhen theFlatten Results option is unset, whereas GoogleSQL does not. Forexample, consider the following legacy SQL query:

#legacySQLSELECTrepository.url,repository.has_downloadsFROM[bigquery-public-data.samples.github_nested]LIMIT5;

This query returnsurl andhas_downloads within a record namedrepositorywhenFlatten Results is unset. Now consider the following GoogleSQL query:

#standardSQLSELECTrepository.url,repository.has_downloadsFROM`bigquery-public-data.samples.github_nested`LIMIT5;

This query returnsurl andhas_downloads as top-level columns; they are notpart of arepository record or struct. To return them as part of a struct, usetheSTRUCT operator:

#standardSQLSELECTSTRUCT(repository.url,repository.has_downloads)ASrepositoryFROM`bigquery-public-data.samples.github_nested`LIMIT5;

Removing repetition with`FLATTEN`

GoogleSQL does not have aFLATTEN function as in legacy SQL, but you canachieve similar semantics using theJOIN (comma) operator. For example,consider the following legacy SQL query:

#legacySQLSELECTrepository.url,payload.pages.page_nameFROMFLATTEN([bigquery-public-data:samples.github_nested],payload.pages.page_name)LIMIT5;

You can express a similar query in GoogleSQL as follows:

#standardSQLSELECTrepository.url,page.page_nameFROM`bigquery-public-data.samples.github_nested`,UNNEST(payload.pages)ASpageLIMIT5;

Or, equivalently, useJOIN rather than the comma, operator:

#standardSQLSELECTrepository.url,page.page_nameFROM`bigquery-public-data.samples.github_nested`JOINUNNEST(payload.pages)ASpageLIMIT5;

One important difference is that the legacy SQL query returns a row wherepayload.pages.page_name isNULL ifpayload.pages is empty. The standardSQL query, however, does not return a row ifpayload.pages is empty. Toachieve exactly the same semantics, use aLEFT JOIN orLEFT OUTER JOIN. Forexample:

#standardSQLSELECTrepository.url,page.page_nameFROM`bigquery-public-data.samples.github_nested`LEFTJOINUNNEST(payload.pages)ASpageLIMIT5;

For more information about arrays, seeWorking with arrays. For moreinformation aboutUNNEST, see theUNNEST topic.

Filtering rows with`OMIT RECORD IF`

TheOMIT IF clause from legacy SQL lets you filter rows based on acondition that can apply to repeated fields. In GoogleSQL, you can modelanOMIT IF clause with anEXISTS clause,IN clause, or simple filter.For example, consider the following legacy SQL query:

#legacySQLSELECTrepository.url,FROM[bigquery-public-data:samples.github_nested]OMITRECORDIFEVERY(payload.pages.page_name!='db_jobskill'ANDpayload.pages.page_name!='Profession');

The analogous GoogleSQL query is:

#standardSQLSELECTrepository.urlFROM`bigquery-public-data.samples.github_nested`WHEREEXISTS(SELECT1FROMUNNEST(payload.pages)WHEREpage_name='db_jobskill'ORpage_name='Profession');

Here theEXISTS clause evaluates totrue if there is at least one element ofpayload.pages where the page name is'db_jobskill' or'Profession'.

Alternatively, suppose that the legacy SQL query usesIN:

#legacySQLSELECTrepository.url,FROM[bigquery-public-data:samples.github_nested]OMITRECORDIFNOTSOME(payload.pages.page_nameIN('db_jobskill','Profession'));

In GoogleSQL, you can express the query using anEXISTS clause withIN:

#standardSQLSELECTrepository.urlFROM`bigquery-public-data.samples.github_nested`WHEREEXISTS(SELECT1FROMUNNEST(payload.pages)WHEREpage_nameIN('db_jobskill','Profession'));

Consider the following legacy SQL query that filters records with 80 or fewerpages:

#legacySQLSELECTrepository.url,FROM[bigquery-public-data:samples.github_nested]OMITRECORDIFCOUNT(payload.pages.page_name)<=80;

In this case, you can use a filter withARRAY_LENGTH in GoogleSQL:

#standardSQLSELECTrepository.urlFROM`bigquery-public-data.samples.github_nested`WHEREARRAY_LENGTH(payload.pages) >80;

Note that theARRAY_LENGTH function applies to the repeatedpayload.pagesfield directly rather than the nested fieldpayload.pages.page_name as in thelegacy SQL query.

For more information about arrays andARRAY subqueries, seeWorking with arrays.

Semantic differences

The semantics of some operations differ between legacy and GoogleSQL.

Runtime errors

Some functions in legacy SQL returnNULL for invalid input, potentiallymasking problems in queries or in data. GoogleSQL is generally more strict,and raises an error if an input is invalid.

For all mathematical functions and operators, legacy SQL does not check foroverflows. GoogleSQL adds overflow checks, and raises an error if acomputation overflows. This includes the+,-,* operators, theSUM,AVG, andSTDDEV aggregate functions, and others.
GoogleSQL raises an error upon division by zero, whereas legacy SQL returnsNULL. To returnNULL for division by zero in GoogleSQL, useSAFE_DIVIDE.
GoogleSQL raises an error forCASTs where the input format is invalid orout of range for the target type, whereas legacy SQL returnsNULL. To avoidraising an error for an invalid cast in GoogleSQL, useSAFE_CAST.

Nested repeated results

Queries executed using GoogleSQL preserve any nesting and repetition of thecolumns in the result, and theFlatten Results option has no effect. Toreturn top-level columns for nested fields, use the.* operator on structcolumns. For example:

#standardSQLSELECTrepository.*FROM`bigquery-public-data.samples.github_nested`LIMIT5;

To return top-level columns for repeated nested fields (ARRAYs ofSTRUCTs),use aJOIN to take the cross product of the table's rows and the elements ofthe repeated nested field. For example:

#standardSQLSELECTrepository.url,page.*FROM`bigquery-public-data.samples.github_nested`JOINUNNEST(payload.pages)ASpageLIMIT5;

For more information about arrays andARRAY subqueries, seeWorking with arrays.

NOT IN conditions and NULL

Legacy SQL does not comply with the SQL standard in its handling ofNULL withNOT IN conditions, whereas GoogleSQL does. Consider the following legacySQL query, which finds the number of words that don't appear in the GitHubsample table as locations:

#legacySQLSELECTCOUNT(*)FROM[bigquery-public-data.samples.shakespeare]WHEREwordNOTIN(SELECTactor_attributes.locationFROM[bigquery-public-data.samples.github_nested]);

This query returns 163,716 as the count, indicating that there are 163,716 wordsthat don't appear as locations in the GitHub table. Now consider the followingGoogleSQL query:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.samples.shakespeare`WHEREwordNOTIN(SELECTactor_attributes.locationFROM`bigquery-public-data.samples.github_nested`);

This query returns 0 as the count. The difference is due to the semantics ofNOT IN with GoogleSQL, which returnsNULL if any value on the right handside isNULL. To achieve the same results as with the legacy SQL query, use aWHERE clause to exclude theNULL values:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.samples.shakespeare`WHEREwordNOTIN(SELECTactor_attributes.locationFROM`bigquery-public-data.samples.github_nested`WHEREactor_attributes.locationISNOTNULL);

This query returns 163,716 as the count. Alternatively, use aNOT EXISTScondition:

#standardSQLSELECTCOUNT(*)FROM`bigquery-public-data.samples.shakespeare`AStWHERENOTEXISTS(SELECT1FROM`bigquery-public-data.samples.github_nested`WHEREt.word=actor_attributes.location);

This query also returns 163,716 as the count. For further reading, see thecomparison operatorssection of the documentation, which explains the semantics ofIN,NOT IN,EXISTS, and other comparison operators.

Differences in user-defined JavaScript functions

TheUser-defined functionstopic documents how to use JavaScript user-defined functions with GoogleSQL.This section explains some of the key differences between user-defined functionsin legacy and GoogleSQL.

Functions in the query text

With GoogleSQL, you useCREATE TEMPORARY FUNCTION as part of the query bodyrather than specifying user-defined functions separately. Examples of definingfunctions separately include using the UDF Editor in the Google Cloud consoleor using the--udf_resource flag inthe bq command-line tool.

Consider the following GoogleSQL query:

#standardSQL-- Computes the harmonic mean of the elements in 'arr'.-- The harmonic mean of x_1, x_2, ..., x_n can be expressed as:--   n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n))CREATETEMPORARYFUNCTIONHarmonicMean(arrARRAY<FLOAT64>)RETURNSFLOAT64LANGUAGEjsAS"""var sum_of_reciprocals = 0;for (var i = 0; i < arr.length; ++i) {  sum_of_reciprocals += 1 / arr[i];}return arr.length / sum_of_reciprocals;""";WITHTAS(SELECTGENERATE_ARRAY(1.0,x*4,x)ASarrFROMUNNEST([1,2,3,4,5])ASx)SELECTarr,HarmonicMean(arr)ASh_meanFROMT;

This query defines a JavaScript function namedHarmonicMean and then appliesit to the array columnarr fromT.

For more information about user-defined functions, see theUser-defined functions topic.

Functions operate on values rather than rows

In legacy SQL, JavaScript functions operate on rows from a table. In standardSQL, as in the preceding example, JavaScript functions operate on values. To pass arow value to a JavaScript function using GoogleSQL, define a function thattakes a struct of the same row type as the table. For example:

#standardSQL-- Takes a struct of x, y, and z and returns a struct with a new field foo.CREATETEMPORARYFUNCTIONAddField(sSTRUCT<xFLOAT64,yBOOL,zSTRING>)RETURNSSTRUCT<xFLOAT64,yBOOL,zSTRING,fooSTRING>LANGUAGEjsAS"""var new_struct = new Object();new_struct.x = s.x;new_struct.y = s.y;new_struct.z = s.z;if (s.y) {  new_struct.foo = 'bar';} else {  new_struct.foo = 'baz';}return new_struct;""";WITHTAS(SELECTx,MOD(off,2)=0ASy,CAST(xASSTRING)ASzFROMUNNEST([5.0,4.0,3.0,2.0,1.0])ASxWITHOFFSEToff)SELECTAddField(t).*FROMTASt;

This query defines a JavaScript function that takes a struct with the same rowtype asT and creates a new struct with an additional field namedfoo. TheSELECT statement passes the rowt as input to the function and uses.* toreturn the fields of the resulting struct in the output.

Migrating table wildcard functions

This section describes in detail how to migrate legacy SQLtable wildcard functions to GoogleSQL.

The TABLE_DATE_RANGE function

The legacy SQLTABLE_DATE_RANGE functions work on tables that conform to aspecific naming scheme:<prefix>YYYYMMDD, where the<prefix> represents thefirst part of a table name andYYYYMMDD represents the date associated withthat table's data.

For example, the following legacy SQL query finds the average temperature froma set of daily tables that contain Seattle area weather data:

#legacySQLSELECTROUND(AVG(TemperatureF),1)ASAVG_TEMP_FFROMTABLE_DATE_RANGE([mydataset.sea_weather_],TIMESTAMP("2016-05-01"),TIMESTAMP("2016-05-09"))

In GoogleSQL, an equivalent query uses a table wildcard and theBETWEENclause.

#standardSQLSELECTROUND(AVG(TemperatureF),1)ASAVG_TEMP_FFROM`mydataset.sea_weather_*`WHERE_TABLE_SUFFIXBETWEEN'20160501'AND'20160509'

The TABLE_DATE_RANGE_STRICT function

This function is equivalent toTABLE_DATE_RANGE, except that it fails with an error if any daily table in the sequence is missing.

Note: If you don't need the query to fail when tables are missing, use the simpler_TABLE_SUFFIX BETWEEN 'YYYYMMDD' AND 'YYYYMMDD' filter, as shown in the TABLE_DATE_RANGE migration example.

Detailed migration for the TABLE_DATE_RANGE_STRICT function

Consider the same query as in the previous example, with theTABLE_DATE_RANGE_STRICT function:

#legacySQLSELECTROUND(AVG(TemperatureF),1)ASAVG_TEMP_FFROMTABLE_DATE_RANGE_STRICT([mydataset.sea_weather_],TIMESTAMP("2016-05-01"),TIMESTAMP("2016-05-09"))

To make an equivalent GoogleSQL query with error handling, the query should be split into two phases: validation and query execution. Depending on the application and query complexity, it can be achieved usingComposability using WITH clauses orBigQuery Scripting (procedural SQL).

Translated query usingWITH clause and conditionalERROR:

#standardSQLWITHMissingTablesAS(SELECTSTRING_AGG(FORMAT_DATE('%Y%m%d',day),', ')ASmissing_suffixesFROMUNNEST(GENERATE_DATE_ARRAY(DATE'2016-05-01',DATE'2016-05-09'))ASdayWHEREFORMAT_DATE('%Y%m%d',day)NOTIN(SELECT_TABLE_SUFFIXFROM`mydataset.sea_weather_*`WHERE_TABLE_SUFFIXBETWEEN'20160501'AND'20160509'))SELECTIF(missing_suffixesISNOTNULL,ERROR(FORMAT("Not found suffixes: %s",missing_suffixes)),(SELECTROUND(AVG(TemperatureF),1)FROM`mydataset.sea_weather_*`WHERE_TABLE_SUFFIXBETWEEN'20160501'AND'20160509'))ASAVG_TEMP_FFROMMissingTables

And usingBigQuery Scripting (procedural SQL):

#standardSQLBEGINDECLAREmissing_suffixesSTRING;SETmissing_suffixes=(SELECTSTRING_AGG(FORMAT_DATE('%Y%m%d',day),', ')FROMUNNEST(GENERATE_DATE_ARRAY(DATE'2016-05-01',DATE'2016-05-09'))ASdayWHEREFORMAT_DATE('%Y%m%d',day)NOTIN(SELECT_TABLE_SUFFIXFROM`mydataset.sea_weather_*`WHERE_TABLE_SUFFIXBETWEEN'20160501'AND'20160509'));-- Raise error or run queryIFmissing_suffixesISNOTNULLTHENSELECTERROR(FORMAT("Not found suffixes: %s",missing_suffixes));ELSESELECTROUND(AVG(TemperatureF),1)ASAVG_TEMP_FFROM`mydataset.sea_weather_*`WHERE_TABLE_SUFFIXBETWEEN'20160501'AND'20160509';ENDIF;END;

The TABLE_QUERY function

The legacy SQLTABLE_QUERY function lets you find table names basedon patterns. When migrating aTABLE_QUERY function to GoogleSQL, whichdoes not support theTABLE_QUERY function, you can instead filter usingthe_TABLE_SUFFIX pseudocolumn. Keep the following differences in mind whenmigrating:

In legacy SQL, you place theTABLE_QUERY function in theFROM clause,whereas in GoogleSQL, you filter using the_TABLE_SUFFIX pseudocolumn intheWHERE clause.
In legacy SQL, theTABLE_QUERY function operates on the entire table name(ortable_id), whereas in GoogleSQL, the_TABLE_SUFFIX pseudocolumncontains part or all of the table name, depending on how you use the wildcardcharacter.

Filter in the WHERE clause

When migrating from legacy SQL to GoogleSQL, move the filter to theWHEREclause. For example, the following query finds the maximum temperatures acrossall years that end in the number0:

#legacySQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROMTABLE_QUERY([bigquery-public-data:noaa_gsod],'REGEXP_MATCH(table_id, r"0$")')WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceORDERBYmaxDESC

In GoogleSQL, an equivalent query uses a table wildcard and places theregular expression function,REGEXP_CONTAINS, in theWHERE clause:

#standardSQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROM`bigquery-public-data.noaa_gsod.gsod*`WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceANDREGEXP_CONTAINS(_TABLE_SUFFIX,r"0$")ORDERBYmaxDESC

Differences between table_id and _TABLE_SUFFIX

In the legacy SQLTABLE_QUERY(dataset, expr) function, the second parameteris an expression that operates over the entire table name, using the valuetable_id. When migrating to GoogleSQL, the filter that you create in theWHERE clause operates on the value of_TABLE_SUFFIX, which can include partor all of the table name, depending on your use of the wildcard character.

For example, the following legacy SQL query uses the entire table name in aregular expression to find the maximum temperatures across all years that endin the number0:

#legacySQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROMTABLE_QUERY([bigquery-public-data:noaa_gsod],'REGEXP_MATCH(table_id, r"gsod\d{3}0")')WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceORDERBYmaxDESC

In GoogleSQL, an equivalent query can use the entire table name or only apart of the table name. You can use an empty prefix in GoogleSQL so thatyour filter operates over the entire table nameFROM `bigquery-public-data.noaa_gsod.*`.

However, longer prefixes perform better than empty prefixes, so the followingexample uses a longer prefix, which means that the value of_TABLE_SUFFIXis only part of the table name.

#standardSQLSELECTmax,ROUND((max-32)*5/9,1)celsius,yearFROM`bigquery-public-data.noaa_gsod.gsod*`WHEREmax!=9999.9# code for missing dataANDmax >100# to improve ORDER BY performanceANDREGEXP_CONTAINS(_TABLE_SUFFIX,r"\d{3}0")ORDERBYmaxDESC

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Migrating to GoogleSQL

Comparison of legacy and GoogleSQL

Do I have to migrate to GoogleSQL?

Enabling GoogleSQL

Advantages of GoogleSQL

GoogleSQL highlights

Composability usingWITH clauses

Composability using SQL functions

Subqueries in more places

Correlated subqueries

Arrays and structs

Data type differences

TIMESTAMP type differences

Automatic data type coercions

Syntax differences

Escaping reserved keywords and invalid identifiers

Project-qualified table names

Table decorators

Time decorator (snapshot decorator)

Range decorator

Partition decorator

Table wildcard functions

Comma operator with tables

Logical views

Function comparison

COUNT function comparison

URL function comparison

Differences in repeated field handling

NULL elements andNULL arrays

Selecting nested repeated leaf fields

Filtering repeated fields

Structure of selected nested leaf fields

Removing repetition withFLATTEN

Filtering rows withOMIT RECORD IF

Semantic differences

Runtime errors

Nested repeated results

NOT IN conditions and NULL

Differences in user-defined JavaScript functions

Functions in the query text

Functions operate on values rather than rows

Migrating table wildcard functions

The TABLE_DATE_RANGE function

The TABLE_DATE_RANGE_STRICT function

The TABLE_QUERY function

Filter in the WHERE clause

Differences between table_id and _TABLE_SUFFIX

Composability using`WITH` clauses

`TIMESTAMP` type differences

`COUNT` function comparison

`NULL` elements and`NULL` arrays

Removing repetition with`FLATTEN`

Filtering rows with`OMIT RECORD IF`