Legacy SQL Syntax, Functions and Operators
This document details legacy SQL query syntax, functions and operators. The preferred query syntax for BigQuery is GoogleSQL. For information on GoogleSQL, seeGoogleSQL query syntax.
Query syntax
Note: Keywords arenot case-sensitive. In this document, keywords such asSELECT are capitalized for illustration purposes.
SELECT clause
TheSELECT clause specifies a list of expressions to be computed. Expressions in theSELECT clause can contain field names, literals, andfunction calls (includingaggregate functions andwindow functions) as well as combinations of the three. The expression list is comma-separated.
Each expression can be given an alias by adding a space followed by an identifier after the expression. The optionalAS keyword can be added between the expression and the alias for improved readability. Aliases defined in aSELECT clause can be referenced in theGROUP BY,HAVING, andORDER BY clauses of the query, but not by theFROM,WHERE, orOMIT RECORD IF clauses nor by other expressions in the sameSELECT clause.
Notes:
- If you use anaggregate function in your
SELECTclause, you must either use an aggregate function in all expressions or your query must have aGROUP BYclause which includes all non-aggregated fields in yourSELECTclause as grouping keys. For example:#legacySQLSELECTword,corpus,COUNT(word)FROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS"th"GROUPBYword,corpus;/* Succeeds because all non-aggregated fields are group keys. */
#legacySQLSELECTword,corpus,COUNT(word)FROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS"th"GROUPBYword;/* Fails because corpus is not aggregated nor is it a group key. */
- You can use square brackets to escapereserved words so that you can use them as field name and aliases. For example, if you have a column named "partition", which is a reserved word in BigQuery syntax, the queries referencing that field fail with obscure error messages unless you escape it with square brackets:
SELECT[partition]FROM...
Example
This example defines aliases in theSELECT clause and then references one of them in theORDER BY clause. Notice that theword column can not be referenced using theword_alias in theWHERE clause; it must be referenced by name. Thelen alias also is not visible in theWHERE clause. It would be visible to aHAVING clause.
#legacySQLSELECTwordASword_alias,LENGTH(word)ASlenFROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS'th'ORDERBYlen;
WITHIN modifier for aggregate functions
aggregate_function WITHIN RECORD [ [ AS ]alias ]
TheWITHIN keyword causes the aggregate function to aggregate across repeated values within each record. For every input record, exactly one aggregated output will be produced. This type of aggregation is referred to asscoped aggregation. Since scoped aggregation produces output for every record, non-aggregated expressions can be selected alongside scoped-aggregated expressions without using aGROUP BY clause.
Most commonly you will use theRECORD scope when using scoped aggregation. If you have a very complex nested, repeated schema, you may find a need to perform aggregations within sub-record scopes. This can be done by replacing theRECORD keyword in the syntax above with the name of the node in your schema where you want the aggregation to be performed. For more information about that advanced behavior, seeDealing with data.
Example
This example performs a scopedCOUNT aggregation and then filters and sorts the records by the aggregated value.
#legacySQLSELECTrepository.url,COUNT(payload.pages.page_name)WITHINRECORDASpage_countFROM[bigquery-public-data:samples.github_nested]HAVINGpage_count >80ORDERBYpage_countDESC;
FROM clause
FROM[project_name:]datasetId.tableId [ [ AS ]alias ] | (subquery) [ [ AS ]alias ] |JOINclause |FLATTENclause |table wildcard function
TheFROM clause specifies the source data to be queried. BigQuery queries can execute directly over tables, over subqueries, over joined tables, and over tables modified by special-purpose operators described below. Combinations of these data sources can be queried using thecomma, which is theUNION ALL operator in BigQuery.
Referencing tables
When referencing a table, bothdatasetId andtableId must be specified;project_name is optional. Ifproject_name is not specified, BigQuery defaults to the current project. If your project name includes a dash, you must surround the entire table reference with brackets.
Example
[my-dashed-project:dataset1.tableName]
Tables can be given an alias by adding a space followed by an identifier after the table name. The optionalAS keyword can be added between thetableId and the alias for improved readability.
When referencing columns from a table, you can use the simple column name or you can prefix the column name with either the alias, if you specified one, or with thedatasetId andtableId as long as noproject_name was specified. Theproject_name cannot be included in the column prefix because the colon character is not allowed in field names.
Examples
This example references a column with no table prefix.
#legacySQLSELECTwordFROM[bigquery-public-data:samples.shakespeare];
This example prefixes the column name with thedatasetId andtableId. Notice that theproject_name cannot be included in this example. This method will only work if the dataset is in your current default project.
#legacySQLSELECTsamples.shakespeare.wordFROMsamples.shakespeare;
This example prefixes the column name with a table alias.
#legacySQLSELECTt.wordFROM[bigquery-public-data:samples.shakespeare]ASt;
Integer-range partitioned tables
Legacy SQL supports using table decorators to address a specific partition in an integer-range partitioned table. The key to address a range partition is the start of the range.
The following example queries the range partition that starts with 30:
#legacySQLSELECT*FROMdataset.table$30;
Note that you cannot use legacy SQL to query across an entire integer-range partitioned table. Instead, the query returns an error like the following:
Querying tables partitioned on a field is not supported in Legacy SQLUsing subqueries
Asubquery is a nestedSELECT statement wrapped in parentheses. The expressions computed in theSELECT clause of the subquery are available to the outer query just as columns of atable would be available.
Subqueries can be used to compute aggregations and other expressions. The full range of SQL operators are available in the subquery. This means a subquery can itself contain other subqueries, subqueries can perform joins and grouping aggregations, etc.
Comma asUNION ALL
Unlike GoogleSQL, legacy SQL uses the comma as aUNION ALL operator rather than aCROSS JOIN operator. This is a legacy behavior that evolved because historically BigQuery did not supportCROSS JOIN and BigQuery users regularly needed to writeUNION ALL queries. In GoogleSQL, queries that perform unions are particularly verbose. Using the comma as the union operator allows such queries to be written much more efficiently. For example, this query can be used to run a single query over logs from multiple days.
#legacySQLSELECTFORMAT_UTC_USEC(event.timestamp_in_usec)AStime,request_urlFROM[applogs.events_20120501],[applogs.events_20120502],[applogs.events_20120503]WHEREevent.username='root'ANDNOTevent.source_ip.is_internal;
Queries that union a large number of tables typically run more slowly than queries that process the same amount of data from a single table. The difference in performance can be up to 50 ms per additional table. A single query can union at most 1,000 tables.
Table wildcard functions
The termtable wildcard function refers to a special type of function unique to BigQuery. These functions are used in theFROM clause to match a collection of table names using one of several types of filters. For example, theTABLE_DATE_RANGE function can be used to query only a specific set of daily tables. For more information on these functions, seeTable wildcard functions.
FLATTEN operator
(FLATTEN([project_name:]datasetId.tableId, field_to_be_flattened))(FLATTEN((subquery), field_to_be_flattened))
Unlike typical SQL-processing systems, BigQuery is designed to handle repeated data. Because of this, BigQuery users sometimes need to write queries that manipulate the structure of repeated records. One way to do this is by using theFLATTEN operator.
FLATTEN converts one node in the schema from repeated to optional. Given a record with one or more values for a repeated field,FLATTEN will create multiple records, one for each value in the repeated field. All other fields selected from the record are duplicated in each new output record.FLATTEN can be applied repeatedly in order to remove multiple levels of repetition.
For more information and examples, seeDealing with data.
JOIN operator
BigQuery supports multipleJOIN operators in eachFROM clause. SubsequentJOIN operations use the results of the previousJOIN operation as the leftJOIN input. Fields from any precedingJOIN input can be used as keys in theON clauses of subsequentJOIN operators.
JOIN types
BigQuery supportsINNER,[FULL|RIGHT|LEFT] OUTER andCROSS JOIN operations. If left unspecified, the default isINNER.
CROSS JOIN operations do not allowON clauses.CROSS JOIN can return a large amount of data and might result in a slow and inefficient query or in a query that exceeds the maximum allowed per-query resources. Such queries will fail with an error. When possible, prefer queries that do not useCROSS JOIN. For example,CROSS JOIN is often used in places wherewindow functions would be more efficient.
EACH modifier
TheEACH modifier is a hint that tells BigQuery to execute theJOIN using multiple partitions. This is particularly useful when you know that both sides of theJOIN are large. TheEACH modifier can't be used inCROSS JOIN clauses.
EACH used to be encouraged in many cases, but this is no longer the case. When possible, useJOIN without theEACH modifier for better performance. UseJOIN EACH when your query has failed with a resources exceeded error message.
Semi-join and Anti-join
In addition to supportingJOIN in theFROM clause, BigQuery also supports two types of joins in theWHERE clause: semi-join and anti-semi-join. A semi-join is specified using theIN keyword with a subquery; anti-join, using theNOT IN keywords.
Examples
The following query uses a semi-join to find ngrams where the first word in the ngram is also the second word in another ngram that has "AND" as the third word in the ngram.
#legacySQLSELECTngramFROM[bigquery-public-data:samples.trigrams]WHEREfirstIN(SELECTsecondFROM[bigquery-public-data:samples.trigrams]WHEREthird="AND")LIMIT10;
The following query uses a semi-join to return the number of women over age 50 who gave birth in the 10 states with the most births.
#legacySQLSELECTmother_age,COUNT(mother_age)totalFROM[bigquery-public-data:samples.natality]WHEREstateIN(SELECTstateFROM(SELECTstate,COUNT(state)totalFROM[bigquery-public-data:samples.natality]GROUPBYstateORDERBYtotalDESCLIMIT10))ANDmother_age>50GROUPBYmother_ageORDERBYmother_ageDESC
To see the numbers for the other 40 states, you can use an anti-join. The following query is nearly identical to the previous example, but usesNOT IN instead ofIN to return the number of women over age 50 who gave birth in the 40 states with the least births.
#legacySQLSELECTmother_age,COUNT(mother_age)totalFROM[bigquery-public-data:samples.natality]WHEREstateNOTIN(SELECTstateFROM(SELECTstate,COUNT(state)totalFROM[bigquery-public-data:samples.natality]GROUPBYstateORDERBYtotalDESCLIMIT10))ANDmother_age>50GROUPBYmother_ageORDERBYmother_ageDESC
Notes:
- BigQuery does not support correlated semi- or anti-semi-joins. The subquery can not reference any fields from the outer query.
- The subquery used in a semi- or anti-semi-join must select exactly one field.
- The types of the selected field and the field being used from the outer query in the
WHEREclause must match exactly. BigQuery will not do any type coercion for semi- or anti-semi-joins.
WHERE clause
TheWHERE clause, sometimes called the predicate, filters records produced by theFROM clause using a boolean expression. Multiple conditions can be joined by booleanAND andOR clauses, optionally surrounded by parentheses—()— to group them. The fields listed in aWHERE clause do not need to be selected in the correspondingSELECT clause and theWHERE clause expression cannot reference expressions computed in theSELECT clause of the query to which theWHERE clause belongs.
Note: Aggregate functions cannot be used in theWHERE clause. Use aHAVING clause and an outer query if you need to filter on the output of an aggregate function.
Example
The following example uses a disjunction of boolean expressions in theWHERE clause—the two expressions joined by anOR operator. An input record will pass through theWHERE filter if either of the expressions returnstrue.
#legacySQLSELECTwordFROM[bigquery-public-data:samples.shakespeare]WHERE(wordCONTAINS'prais'ANDwordCONTAINS'ing')OR(wordCONTAINS'laugh'ANDwordCONTAINS'ed');
OMIT RECORD IF clause
TheOMIT RECORD IF clause is a construct that is unique to BigQuery. It is particularly useful for dealing with nested, repeated schemas. It is similar to aWHERE clause, but different in two important ways. First, it uses an exclusionary condition, which means that records are omitted if the expression returnstrue, but kept if the expression returnsfalse ornull. Second, theOMIT RECORD IF clause can (and usually does) use scoped aggregate functions in its condition.
In addition to filtering full records,OMIT...IF can specify a more narrow scope to filter just portions of a record. This is done by using the name of a non-leaf node in your schema rather thanRECORD in yourOMIT...IF clause. This functionality is rarely used by BigQuery users. You can find more documentation about this advanced behavior linked from theWITHIN documentation above.
If you useOMIT...IF to exclude a portion of a record in a repeating field, and the query also selects other independently repeating fields, BigQuery omits a portion of the other repeated records in the query. If you see the errorCannot perform OMIT IF on repeated scope <scope> with independently repeating pass through field <field>, we recommend that you switch to GoogleSQL. For information about migratingOMIT...IF statements to GoogleSQL, seeMigrating to GoogleSQL.
Example
Referring back to the example used for theWITHIN modifier,OMIT RECORD IF can be used to accomplish the same thingWITHIN andHAVING were used to do in that example.
#legacySQLSELECTrepository.urlFROM[bigquery-public-data:samples.github_nested]OMITRECORDIFCOUNT(payload.pages.page_name)<=80;
GROUP BY clause
TheGROUP BY clause lets you group rows that have the same values for a given field or set of fields so that you can compute aggregations of related fields. Grouping occurs after the filtering performed in theWHERE clause but before the expressions in theSELECT clause are computed. The expression results cannot be used as group keys in theGROUP BY clause.
Example
This query finds the top ten most commonfirst words in the trigrams sample dataset. In addition to demonstrating the use of theGROUP BY clause, it demonstrates how positional indexes can be used instead of field names in theGROUP BY andORDER BY clauses.
#legacySQLSELECTfirst,COUNT(ngram)FROM[bigquery-public-data:samples.trigrams]GROUPBY1ORDERBY2DESCLIMIT10;
Aggregation performed using aGROUP BY clause is calledgrouped aggregation. Unlikescoped aggregation, grouped aggregation is common in most SQL processing systems.
TheEACH modifier
TheEACH modifier is a hint that tells BigQuery to execute theGROUP BY using multiple partitions. This is particularly useful when you know that your dataset contains a large number of distinct values for the group keys.
EACH used to be encouraged in many cases, but this is no longer the case. UsingGROUP BY without theEACH modifier usually provides better performance. UseGROUP EACH BY when your query has failed with a resources exceeded error message.
TheROLLUP function
When theROLLUP function is used, BigQuery adds extra rows to the query result that representrolled up aggregations. All fields listed afterROLLUP must be enclosed in a single set of parentheses. In rows added because of theROLLUP function,NULL indicates the columns for which the aggregation is rolled up.
Example
This query generates per-year counts of male and female births from the sample natality dataset.
#legacySQLSELECTyear,is_male,COUNT(1)ascountFROM[bigquery-public-data:samples.natality]WHEREyear>=2000ANDyear<=2002GROUPBYROLLUP(year,is_male)ORDERBYyear,is_male;
These are the results of the query. Notice that there are rows where one or both of the group keys areNULL. These rows are therollup rows.
+------+---------+----------+| year | is_male | count |+------+---------+----------+| NULL | NULL | 12122730 || 2000 | NULL | 4063823 || 2000 | false | 1984255 || 2000 | true | 2079568 || 2001 | NULL | 4031531 || 2001 | false | 1970770 || 2001 | true | 2060761 || 2002 | NULL | 4027376 || 2002 | false | 1966519 || 2002 | true | 2060857 |+------+---------+----------+
When using theROLLUP function, you can use theGROUPING function to distinguish between rows that were added because of theROLLUP function and rows that actually have aNULL value for the group key.
Example
This query adds theGROUPING function to the previous example to better identify the rows added because of theROLLUP function.
#legacySQLSELECTyear,GROUPING(year)asrollup_year,is_male,GROUPING(is_male)asrollup_gender,COUNT(1)ascountFROM[bigquery-public-data:samples.natality]WHEREyear>=2000ANDyear<=2002GROUPBYROLLUP(year,is_male)ORDERBYyear,is_male;
These are the result the new query returns.
+------+-------------+---------+---------------+----------+| year | rollup_year | is_male | rollup_gender | count |+------+-------------+---------+---------------+----------+| NULL | 1 | NULL | 1 | 12122730 || 2000 | 0 | NULL | 1 | 4063823 || 2000 | 0 | false | 0 | 1984255 || 2000 | 0 | true | 0 | 2079568 || 2001 | 0 | NULL | 1 | 4031531 || 2001 | 0 | false | 0 | 1970770 || 2001 | 0 | true | 0 | 2060761 || 2002 | 0 | NULL | 1 | 4027376 || 2002 | 0 | false | 0 | 1966519 || 2002 | 0 | true | 0 | 2060857 |+------+-------------+---------+---------------+----------+
Notes:
- Non-aggregated fields in the
SELECTclausemust be listed in theGROUP BYclause.#legacySQLSELECTword,corpus,COUNT(word)FROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS"th"GROUPBYword,corpus;/* Succeeds because all non-aggregated fields are group keys. */
#legacySQLSELECTword,corpus,COUNT(word)FROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS"th"GROUPBYword;/* Fails because corpus is not aggregated nor is it a group key. */
- Expressions computed in the
SELECTclause cannot be used in the correspondingGROUP BYclause.#legacySQLSELECTword,corpus,COUNT(word)word_countFROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS"th"GROUPBYword,corpus,word_count;/* Fails because word_count is not visible to this
GROUP BYclause. */ - Grouping by float and double values is not supported, because the equality function for those types is not well-defined.
- Because the system is interactive, queries that produce a large number of groups might fail. The use of the
TOPfunction instead ofGROUP BYmight solve some scaling problems.
HAVING clause
TheHAVING clause behaves exactly like theWHERE clause except that it is evaluated after theSELECT clause so the results of all computed expressions are visible to theHAVING clause. The HAVING clause can only refer to outputs of the correspondingSELECTclause.
Example
This query computes the most commonfirst words in the ngram sample dataset that contain the letter a and occur at most 10,000 times.
#legacySQLSELECTfirst,COUNT(ngram)ngram_countFROM[bigquery-public-data:samples.trigrams]GROUPBY1HAVINGfirstcontains"a"ANDngram_count <10000ORDERBY2DESCLIMIT10;
ORDER BY clause
TheORDER BY clause sorts the results of a query in ascending or descending order using one or more key fields. To sort by multiple fields or aliases, enter them as a comma-separated list. The results are sorted on the fields in the order in which they are listed. UseDESC (descending) orASC (ascending) to specify the sort direction.ASC is the default. A different sort direction can be specified for each sort key.
TheORDER BY clause is evaluated after theSELECT clause so it can reference the output of any expression computed in theSELECT. If a field is given an alias in theSELECT clause, the alias must be used in theORDER BY clause.
LIMIT clause
TheLIMIT clause limits the number of rows in the returned result set. Since BigQuery queries regularly operate over very large numbers of rows,LIMIT is a good way to avoid long-running queries by processing only a subset of the rows.
Notes:
- The
LIMITclause will stop processing and return results when it satisfies your requirements. This can reduce processing time for some queries, but when you specify aggregate functions such as COUNT orORDER BYclauses, the full result set must still be processed before returning results. TheLIMITclause is the last to be evaluated. - A query with a
LIMITclause may still be non-deterministic if there is no operator in the query that guarantees the ordering of the output result set. This is because BigQuery executes using a large number of parallel workers. The order in which parallel jobs return is not guaranteed. - The
LIMITclause cannot contain any functions; it takes only a numeric constant. - When the
LIMITclause is used, the total bytes processed and the bytes billed can vary for the same query.
Query grammar
The individual clauses of BigQuerySELECT statements are described in detailabove. Here we present the full grammar ofSELECTstatements in a compact form with links back to the individual sections.
query:SELECT{*|field_path.*|expression}[[AS]alias][,...][FROMfrom_body[WHEREbool_expression][OMITRECORDIFbool_expression][GROUP[EACH]BY[ROLLUP]{field_name_or_alias}[,...]][HAVINGbool_expression][ORDERBYfield_name_or_alias[{DESC|ASC}][,...]][LIMITn]];from_body:{from_item[,...]|#Warning:CommameansUNIONALLherefrom_item[join_type]JOIN[EACH]from_item[ONjoin_predicate]|(FLATTEN({table_name|(query)},field_name_or_alias))|table_wildcard_function}from_item:{table_name|(query)}[[AS]alias]join_type:{INNER|[FULL][OUTER]|RIGHT[OUTER]|LEFT[OUTER]|CROSS}join_predicate:field_from_one_side_of_the_join=field_from_the_other_side_of_the_join[AND...]expression:{literal_value|field_name_or_alias|function_call}bool_expression:{expression_which_results_in_a_boolean_value|bool_expressionANDbool_expression|bool_expressionORbool_expression|NOTbool_expression}
Notation:
- Square brackets "[ ]" indicate optional clauses.
- Curly braces "{ }" enclose a set of options.
- The vertical bar "|" indicates a logical OR.
- A comma or keyword followed by an ellipsis within square brackets "[, ... ]" indicates that the preceding item can repeat in a list with the specified separator.
- Parentheses "( )" indicate literal parentheses.
Supported functions and operators
MostSELECT statement clauses support functions. Fieldsreferenced in a function don't need to be listed in anySELECTclause. Therefore, the following query is valid, even though theclicks field is not displayed directly:
#legacySQLSELECTcountry,SUM(clicks)FROMtableGROUPBYcountry;
| Aggregate functions | |
|---|---|
AVG() | Returns the average of the values for a group of rows ... |
BIT_AND() | Returns the result of a bitwise AND operation ... |
BIT_OR() | Returns the result of a bitwise OR operation ... |
BIT_XOR() | Returns the result of a bitwise XOR operation ... |
CORR() | Returns the Pearson correlation coefficient of a set of number pairs. |
COUNT() | Returns the total number of values ... |
COUNT([DISTINCT]) | Returns the total number of non-NULL values ... |
COVAR_POP() | Computes the population covariance of the values ... |
COVAR_SAMP() | Computes the sample covariance of the values ... |
EXACT_COUNT_DISTINCT() | Returns the exact number of non-NULL, distinct values for the specified field. |
FIRST() | Returns the first sequential value in the scope of the function. |
GROUP_CONCAT() | Concatenates multiple strings into a single string ... |
GROUP_CONCAT_UNQUOTED() | Concatenates multiple strings into a single string ... will not add double quotes ... |
LAST() | Returns the last sequential value ... |
MAX() | Returns the maximum value ... |
MIN() | Returns the minimum value ... |
NEST() | Aggregates all values in the current aggregation scope into a repeated field. |
NTH() | Returns the nth sequential value ... |
QUANTILES() | Computes approximate minimum, maximum, and quantiles ... |
STDDEV() | Returns the standard deviation ... |
STDDEV_POP() | Computes the population standard deviation ... |
STDDEV_SAMP() | Computes the sample standard deviation ... |
SUM() | Returns the sum total of the values ... |
TOP() ... COUNT(*) | Returns the top max_records records by frequency. |
UNIQUE() | Returns the set of unique, non-NULL values ... |
VARIANCE() | Computes the variance of the values ... |
VAR_POP() | Computes the population variance of the values ... |
VAR_SAMP() | Computes the sample variance of the values ... |
| Arithmetic operators | |
|---|---|
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
% | Modulo |
| Bitwise functions | |
|---|---|
& | Bitwise AND |
| | Bitwise OR |
^ | Bitwise XOR |
<< | Bitwise shift left |
>> | Bitwise shift right |
~ | Bitwise NOT |
BIT_COUNT() | Returns the number of bits ... |
| Casting functions | |
|---|---|
BOOLEAN() | Cast to boolean. |
BYTES() | Cast to bytes. |
CAST(expr AS type) | Convertsexpr into a variable of typetype. |
FLOAT() | Cast to double. |
HEX_STRING() | Cast to hexadecimal string. |
INTEGER() | Cast to integer. |
STRING() | Cast to string. |
| Comparison functions | |
|---|---|
expr1 =expr2 | Returnstrue if the expressions are equal. |
expr1 !=expr2expr1 <>expr2 | Returnstrue if the expressions are not equal. |
expr1 >expr2 | Returnstrue ifexpr1 is greater thanexpr2. |
expr1 <expr2 | Returnstrue ifexpr1 is less thanexpr2. |
expr1 >=expr2 | Returnstrue ifexpr1 is greater than or equal toexpr2. |
expr1 <=expr2 | Returnstrue ifexpr1 is less than or equal toexpr2. |
expr1 BETWEENexpr2 ANDexpr3 | Returnstrue if the value ofexpr1 is betweenexpr2 andexpr3, inclusive. |
expr IS NULL | Returnstrue ifexpr is NULL. |
expr IN() | Returnstrue ifexpr matchesexpr1,expr2, or any value in the parentheses. |
COALESCE() | Returns the first argument that isn't NULL. |
GREATEST() | Returns the largestnumeric_expr parameter. |
IFNULL() | If argument is not null, returns the argument. |
IS_INF() | Returnstrue if positive or negative infinity. |
IS_NAN() | Returnstrue if argument isNaN. |
IS_EXPLICITLY_DEFINED() | deprecated: Useexpr IS NOT NULL instead. |
LEAST() | Returns the smallest argumentnumeric_expr parameter. |
NVL() | Ifexpr is not null, returnsexpr, otherwise returnsnull_default. |
| Date and time functions | |
|---|---|
CURRENT_DATE() | Returns current date in the format%Y-%m-%d. |
CURRENT_TIME() | Returns the server's current time in the format%H:%M:%S. |
CURRENT_TIMESTAMP() | Returns the server's current time in the format%Y-%m-%d %H:%M:%S. |
DATE() | Returns the date in the format%Y-%m-%d. |
DATE_ADD() | Adds the specified interval to a TIMESTAMP data type. |
DATEDIFF() | Returns the number of days between two TIMESTAMP data types. |
DAY() | Returns the day of the month as an integer between 1 and 31. |
DAYOFWEEK() | Returns the day of the week as an integer between 1 (Sunday) and 7 (Saturday). |
DAYOFYEAR() | Returns the day of the year as an integer between 1 and 366. |
FORMAT_UTC_USEC() | Returns a UNIX timestamp in the formatYYYY-MM-DD HH:MM:SS.uuuuuu. |
HOUR() | Returns the hour of a TIMESTAMP as an integer between 0 and 23. |
MINUTE() | Returns the minutes of a TIMESTAMP as an integer between 0 and 59. |
MONTH() | Returns the month of a TIMESTAMP as an integer between 1 and 12. |
MSEC_TO_TIMESTAMP() | Converts a UNIX timestamp in milliseconds to a TIMESTAMP. |
NOW() | Returns the current UNIX timestamp in microseconds. |
PARSE_UTC_USEC() | Converts a date string to a UNIX timestamp in microseconds. |
QUARTER() | Returns the quarter of the year of a TIMESTAMP as an integer between 1 and 4. |
SEC_TO_TIMESTAMP() | Converts a UNIX timestamp in seconds to a TIMESTAMP. |
SECOND() | Returns the seconds of a TIMESTAMP as an integer between 0 and 59. |
STRFTIME_UTC_USEC() | Returns a date string in the formatdate_format_str. |
TIME() | Returns a TIMESTAMP in the format%H:%M:%S. |
TIMESTAMP() | Convert a date string to a TIMESTAMP. |
TIMESTAMP_TO_MSEC() | Converts a TIMESTAMP to a UNIX timestamp in milliseconds. |
TIMESTAMP_TO_SEC() | Converts a TIMESTAMP to a UNIX timestamp in seconds. |
TIMESTAMP_TO_USEC() | Converts a TIMESTAMP to a UNIX timestamp in microseconds. |
USEC_TO_TIMESTAMP() | Converts a UNIX timestamp in microseconds to a TIMESTAMP. |
UTC_USEC_TO_DAY() | Shifts a UNIX timestamp in microseconds to the beginning of the day it occurs in. |
UTC_USEC_TO_HOUR() | Shifts a UNIX timestamp in microseconds to the beginning of the hour it occurs in. |
UTC_USEC_TO_MONTH() | Shifts a UNIX timestamp in microseconds to the beginning of the month it occurs in. |
UTC_USEC_TO_WEEK() | Returns a UNIX timestamp in microseconds that represents a day in the week. |
UTC_USEC_TO_YEAR() | Returns a UNIX timestamp in microseconds that represents the year. |
WEEK() | Returns the week of a TIMESTAMP as an integer between 1 and 53. |
YEAR() | Returns the year of a TIMESTAMP. |
| IP functions | |
|---|---|
FORMAT_IP() | Converts 32 least significant bits ofinteger_value to human-readable IPv4 address string. |
PARSE_IP() | Converts a string representing IPv4 address to unsigned integer value. |
FORMAT_PACKED_IP() | Returns a human-readable IP address in the form10.1.5.23 or2620:0:1009:1:216:36ff:feef:3f. |
PARSE_PACKED_IP() | Returns an IP address inBYTES. |
| JSON functions | |
|---|---|
JSON_EXTRACT() | Selects a value according to the JSONPath expression and returns a JSON string. |
JSON_EXTRACT_SCALAR() | Selects a value according to the JSONPath expression and returns a JSON scalar. |
| Logical operators | |
|---|---|
expr ANDexpr | Returnstrue if both expressions are true. |
expr ORexpr | Returnstrue if one or both expressions are true. |
NOTexpr | Returnstrue if the expression is false. |
| Mathematical functions | |
|---|---|
ABS() | Returns the absolute value of the argument. |
ACOS() | Returns the arc cosine of the argument. |
ACOSH() | Returns the arc hyperbolic cosine of the argument. |
ASIN() | Returns the arc sine of the argument. |
ASINH() | Returns the arc hyperbolic sine of the argument. |
ATAN() | Returns the arc tangent of the argument. |
ATANH() | Returns the arc hyperbolic tangent of the argument. |
ATAN2() | Returns the arc tangent of the two arguments. |
CEIL() | Rounds the argument up to the nearest whole number and returns the rounded value. |
COS() | Returns the cosine of the argument. |
COSH() | Returns the hyperbolic cosine of the argument. |
DEGREES() | Converts from radians to degrees. |
EXP() | Returnse to the power of the argument. |
FLOOR() | Rounds the argument down to the nearest whole number. |
LN()LOG() | Returns the natural logarithm of the argument. |
LOG2() | Returns the Base-2 logarithm of the argument. |
LOG10() | Returns the Base-10 logarithm of the argument. |
PI() | Returns the constant π. |
POW() | Returns first argument to the power of the second argument. |
RADIANS() | Converts from degrees to radians. |
RAND() | Returns a random float value in the range 0.0 <= value < 1.0. |
ROUND() | Rounds the argument either up or down to the nearest whole number. |
SIN() | Returns the sine of the argument. |
SINH() | Returns the hyperbolic sine of the argument. |
SQRT() | Returns the square root of the expression. |
TAN() | Returns the tangent of the argument. |
TANH() | Returns the hyperbolic tangent of the argument. |
| Regular expression functions | |
|---|---|
REGEXP_MATCH() | Returns true if the argument matches the regular expression. |
REGEXP_EXTRACT() | Returns the portion of the argument that matches the capturing group within the regular expression. |
REGEXP_REPLACE() | Replaces a substring that matches a regular expression. |
| String functions | |
|---|---|
CONCAT() | Returns the concatenation of two or more strings, or NULL if any of the values are NULL. |
expr CONTAINS 'str' | Returnstrue ifexpr contains the specified string argument. |
INSTR() | Returns the one-based index of the first occurrence of a string. |
LEFT() | Returns the leftmost characters of a string. |
LENGTH() | Returns the length of the string. |
LOWER() | Returns the original string with all characters in lower case. |
LPAD() | Inserts characters to the left of a string. |
LTRIM() | Removes characters from the left side of a string. |
REPLACE() | Replaces all occurrences of a substring. |
RIGHT() | Returns the rightmost characters of a string. |
RPAD() | Inserts characters to the right side of a string. |
RTRIM() | Removes trailing characters from the right side of a string. |
SPLIT() | Splits a string into repeated substrings. |
SUBSTR() | Returns a substring ... |
UPPER() | Returns the original string with all characters in upper case. |
| Table wildcard functions | |
|---|---|
TABLE_DATE_RANGE() | Queries multiple daily tables that span a date range. |
TABLE_DATE_RANGE_STRICT() | Queries multiple daily tables that span a date range, with no missing dates. |
TABLE_QUERY() | Queries tables whose names match a specified predicate. |
| URL functions | |
|---|---|
HOST() | Given a URL, returns the host name as a string. |
DOMAIN() | Given a URL, returns the domain as a string. |
TLD() | Given a URL, returns the top level domain plus any country domain in the URL. |
| Window functions | |
|---|---|
AVG()COUNT(*)COUNT([DISTINCT])MAX()MIN()STDDEV()SUM() | The same operation as the correspondingAggregate functions, but are computed over a window defined by the OVER clause. |
CUME_DIST() | Returns a double that indicates the cumulative distribution of a value in a group of values ... |
DENSE_RANK() | Returns the integer rank of a value in a group of values. |
FIRST_VALUE() | Returns the first value of the specified field in the window. |
LAG() | Enables you to read data from a previous row within a window. |
LAST_VALUE() | Returns the last value of the specified field in the window. |
LEAD() | Enables you to read data from a following row within a window. |
NTH_VALUE() | Returns the value of<expr> at position<n> of the window frame ... |
NTILE() | Divides the window into the specified number of buckets. |
PERCENT_RANK() | Returns the rank of the current row, relative to the other rows in the partition. |
PERCENTILE_CONT() | Returns an interpolated value that would map to the percentile argument with respect to the window ... |
PERCENTILE_DISC() | Returns the value nearest the percentile of the argument over the window. |
RANK() | Returns the integer rank of a value in a group of values. |
RATIO_TO_REPORT() | Returns the ratio of each value to the sum of the values. |
ROW_NUMBER() | Returns the current row number of the query result over the window. |
| Other functions | |
|---|---|
CASE WHEN ... THEN | Use CASE to choose among two or more alternate expressions in your query. |
CURRENT_USER() | Returns the email address of the user running the query. |
EVERY() | Returns true if the argument is true for all of its inputs. |
FROM_BASE64() | Converts the base-64 encoded input string into BYTES format. |
HASH() | Computes and returns a 64-bit signed hash value ... |
FARM_FINGERPRINT() | Computes and returns a 64-bit signed fingerprint value ... |
IF() | If first argument is true, returns second argument; otherwise returns third argument. |
POSITION() | Returns the one-based, sequential position of the argument. |
SHA1() | Returns aSHA1 hash, in BYTES format. |
SOME() | Returns true if argument is true for at least one of its inputs. |
TO_BASE64() | Converts the BYTES argument to a base-64 encoded string. |
Aggregate functions
Aggregate functions return values that represent summaries of larger sets of data, which makes these functions particularly useful for analyzing logs. An aggregate function operates against a collection of values and returns a single value per table, group, or scope:
- Table aggregation
Uses an aggregate function to summarize all qualifying rows in the table. For example:
SELECT COUNT(f1) FROM ds.Table; - Group aggregation
Uses an aggregate function and a
GROUP BYclause that specifies a non-aggregated field to summarize rows by group. For example:SELECT COUNT(f1) FROM ds.Table GROUP BY b1;TheTOP function represents a specialized case of group aggregation.
- Scoped aggregation
This feature applies only to tables that havenested fields.
Uses an aggregate function and theWITHINkeyword to aggregate repeated values within a defined scope. For example:SELECT COUNT(m1.f2) WITHIN RECORD FROM Table;The scope can be
RECORD, which corresponds to entire row, or a node (repeated field in a row). Aggregation functions operate over the values within the scope and return aggregated results for each record or node.
You can apply a restriction to an aggregate function using one of the following options:
An alias in a subselect query. The restriction is specified in the outer
WHEREclause.#legacySQLSELECTcorpus,count_corpus_wordsFROM(SELECTcorpus,count(word)AScount_corpus_wordsFROM[bigquery-public-data:samples.shakespeare]GROUPBYcorpus)ASsub_shakespeareWHEREcount_corpus_words >4000
An alias in aHAVING clause.
#legacySQLSELECTcorpus,count(word)AScount_corpus_wordsFROM[bigquery-public-data:samples.shakespeare]GROUPBYcorpusHAVINGcount_corpus_words >4000;
You can also refer to an alias in theGROUP BY orORDER BY clauses.
Syntax
| Aggregate functions | |
|---|---|
AVG() | Returns the average of the values for a group of rows ... |
BIT_AND() | Returns the result of a bitwise AND operation ... |
BIT_OR() | Returns the result of a bitwise OR operation ... |
BIT_XOR() | Returns the result of a bitwise XOR operation ... |
CORR() | Returns the Pearson correlation coefficient of a set of number pairs. |
COUNT() | Returns the total number of values ... |
COUNT([DISTINCT]) | Returns the total number of non-NULL values ... |
COVAR_POP() | Computes the population covariance of the values ... |
COVAR_SAMP() | Computes the sample covariance of the values ... |
EXACT_COUNT_DISTINCT() | Returns the exact number of non-NULL, distinct values for the specified field. |
FIRST() | Returns the first sequential value in the scope of the function. |
GROUP_CONCAT() | Concatenates multiple strings into a single string ... |
GROUP_CONCAT_UNQUOTED() | Concatenates multiple strings into a single string ... will not add double quotes ... |
LAST() | Returns the last sequential value ... |
MAX() | Returns the maximum value ... |
MIN() | Returns the minimum value ... |
NEST() | Aggregates all values in the current aggregation scope into a repeated field. |
NTH() | Returns the nth sequential value ... |
QUANTILES() | Computes approximate minimum, maximum, and quantiles ... |
STDDEV() | Returns the standard deviation ... |
STDDEV_POP() | Computes the population standard deviation ... |
STDDEV_SAMP() | Computes the sample standard deviation ... |
SUM() | Returns the sum total of the values ... |
TOP() ... COUNT(*) | Returns the top max_records records by frequency. |
UNIQUE() | Returns the set of unique, non-NULL values ... |
VARIANCE() | Computes the variance of the values ... |
VAR_POP() | Computes the population variance of the values ... |
VAR_SAMP() | Computes the sample variance of the values ... |
AVG(numeric_expr)- Returns the average of the values for a group of rows computed by
numeric_expr. Rows with a NULL value are not included in the calculation. BIT_AND(numeric_expr)- Returns the result of a bitwise
ANDoperation between each instance ofnumeric_expracross all rows.NULLvalues are ignored. This function returnsNULLif all instances ofnumeric_exprevaluate toNULL. BIT_OR(numeric_expr)- Returns the result of a bitwise
ORoperation between each instance ofnumeric_expracross all rows.NULLvalues are ignored. This function returnsNULLif all instances ofnumeric_exprevaluate toNULL. BIT_XOR(numeric_expr)- Returns the result of a bitwise
XORoperation between each instance ofnumeric_expracross all rows.NULLvalues are ignored. This function returnsNULLif all instances ofnumeric_exprevaluate toNULL. CORR(numeric_expr,numeric_expr)- Returns the Pearson correlation coefficient of a set of number pairs.
COUNT(*)- Returns the total number of values (NULL and non-NULL) in the scope of the function. Unless you are using
COUNT(*)with theTOPfunction, it is better to explicitly specify the field to count. COUNT([DISTINCT]field [,n])- Returns the total number of non-NULL values in the scope of the function.
If you use the
DISTINCTkeyword, the function returns the number ofdistinct values for the specified field. Note that the returned value forDISTINCTis astatistical approximation and is not guaranteed to be exact.Use
EXACT_COUNT_DISTINCT()for an exact answer.If you require greater accuracy from
, you can specify a second parameter,COUNT(DISTINCT)n, which gives the threshold below which exact results are guaranteed. By default,nis 1000, but if you give a largern, you will get exact results forCOUNT(DISTINCT)up to that value ofn. However, giving larger values ofnwill reduce scalability of this operator and may substantially increase query execution time or cause the query to fail.To compute the exact number of distinct values, useEXACT_COUNT_DISTINCT. Or, for a more scalable approach, consider using
GROUP EACH BYon the relevant field(s) and then applyingCOUNT(*). TheGROUP EACH BYapproach is more scalable but might incur a slight up-front performance penalty. COVAR_POP(numeric_expr1,numeric_expr2)- Computes thepopulation covariance of the values computed by
numeric_expr1andnumeric_expr2. COVAR_SAMP(numeric_expr1,numeric_expr2)- Computes thesample covariance of the values computed by
numeric_expr1andnumeric_expr2. EXACT_COUNT_DISTINCT(field)- Returns the exact number of non-NULL, distinct values for the specified field. For better scalability and performance, useCOUNT(DISTINCTfield).
FIRST(expr)- Returns the first sequential value in the scope of the function.
GROUP_CONCAT('str' [,separator])Concatenates multiple strings into a single string, where each value is separated by the optional
separatorparameter. Ifseparatoris omitted, BigQuery returns a comma-separated string.If a string in the source data contains a double quote character,
GROUP_CONCATreturns the string with double quotes added. For example, the stringa"bwould return as"a""b". UseGROUP_CONCAT_UNQUOTEDif you prefer that these strings do not return with double quotes added.Example:
#legacySQLSELECTGROUP_CONCAT(x)FROM(SELECT'a"b'ASx),(SELECT'cd'ASx);
GROUP_CONCAT_UNQUOTED('str' [,separator])Concatenates multiple strings into a single string, where each value is separated by the optional
separatorparameter. Ifseparatoris omitted, BigQuery returns a comma-separated string.Unlike
GROUP_CONCAT, this function will not add double quotes to returned values that include a double quote character. For example, the stringa"bwould return asa"b.Example:
#legacySQLSELECTGROUP_CONCAT_UNQUOTED(x)FROM(SELECT'a"b'ASx),(SELECT'cd'ASx);
LAST(field)- Returns the last sequential value in the scope of the function.
MAX(field)- Returns the maximum value in the scope of the function.
MIN(field)- Returns the minimum value in the scope of the function.
NEST(expr)Aggregates all values in the current aggregation scope into a repeated field. For example, the query
"SELECT x, NEST(y) FROM ... GROUP BY x"returns one output record for each distinctxvalue, and contains a repeated field for allyvalues paired withxin the query input. TheNESTfunction requires aGROUP BYclause.BigQuery automatically flattens query results, so if you use the
NESTfunction on the top level query, the results won't contain repeated fields. Use theNESTfunction when using a subselect that produces intermediate results for immediate use by the same query.NTH(n,field)- Returns the
nth sequential value in the scope of the function, wherenis a constant. TheNTHfunction starts counting at 1, so there is no zeroth term. If the scope of the function has less thannvalues, the function returnsNULL. QUANTILES(expr[,buckets])Computes approximate minimum, maximum, and quantiles for the input expression.
NULLinput values are ignored. Empty or exclusively-NULLinput results inNULLoutput. The number of quantiles computed is controlled with the optionalbucketsparameter, which includes the minimum and maximum in the count. To compute approximate N-tiles, use N+1buckets. The default value ofbucketsis 100. (Note: The default of 100 does not estimate percentiles. To estimate percentiles, use 101bucketsat minimum.) If specified explicitly,bucketsmust be at least 2.The fractional error per quantile is epsilon = 1 /
buckets, which means that the error decreases as the number of buckets increases. For example:QUANTILES(<expr>, 2) # computes min and max with 50% error.QUANTILES(<expr>, 3) # computes min, median, and max with 33% error.QUANTILES(<expr>, 5) # computes quartiles with 25% error.QUANTILES(<expr>, 11) # computes deciles with 10% error.QUANTILES(<expr>, 21) # computes vigintiles with 5% error.QUANTILES(<expr>, 101) # computes percentiles with 1% error.
The
NTHfunction can be used to pick a particular quantile, but remember thatNTHis 1-based, and thatQUANTILESreturns the minimum ("0th" quantile) in the first position, and the maximum ("100th" percentile or "Nth" N-tile) in the last position. For example,NTH(11, QUANTILES(expr, 21))estimates the median ofexpr, whereasNTH(20, QUANTILES(expr, 21))estimates the 19th vigintile (95th percentile) ofexpr. Both estimates have a 5% margin of error.To improve accuracy, use more buckets. For example, to reduce the margin of error for the previous calculations from 5% to 0.1%, use 1001 buckets instead of 21, and adjust the argument to the
NTHfunction accordingly. To calculate the median with 0.1% error, useNTH(501, QUANTILES(expr, 1001)); for the 95th percentile with 0.1% error, useNTH(951, QUANTILES(expr, 1001)).STDDEV(numeric_expr)- Returns the standard deviation of the values computed by
numeric_expr. Rows with a NULL value are not included in the calculation. TheSTDDEVfunction is an alias forSTDDEV_SAMP. STDDEV_POP(numeric_expr)- Computes thepopulation standard deviation of the value computed by
numeric_expr. UseSTDDEV_POP()to compute the standard deviation of a dataset that encompasses the entire population of interest. If your dataset comprises only a representative sample of the population, useSTDDEV_SAMP()instead. For more information about population versus sample standard deviation, seeStandard deviation on Wikipedia. STDDEV_SAMP(numeric_expr)- Computes thesample standard deviation of the value computed by
numeric_expr. UseSTDDEV_SAMP()to compute the standard deviation of an entire population based on a representative sample of the population. If your dataset comprises the entire population, useSTDDEV_POP()instead. For more information about population versus sample standard deviation, seeStandard deviation on Wikipedia. SUM(field)- Returns the sum total of the values in the scope of the function. For use with numerical data types only.
TOP(field|alias[,max_values][,multiplier]) ... COUNT(*)- Returns the topmax_records records by frequency. See theTOP description below for details.
UNIQUE(expr)- Returns the set of unique, non-NULL values in the scope of the function in an undefined order. Similar to a large
GROUP BYclause without theEACHkeyword, the query will fail with a "Resources Exceeded" error if there are too many distinct values. UnlikeGROUP BY, however, theUNIQUEfunction can be applied with scoped aggregation, allowing efficient operation on nested fields with a limited number of values. VARIANCE(numeric_expr)- Computes the variance of the values computed by
numeric_expr. Rows with a NULL value are not included in the calculation. TheVARIANCEfunction is an alias forVAR_SAMP. VAR_POP(numeric_expr)- Computes thepopulation variance of the values computed by
numeric_expr. For more information about population versus sample standard deviation, seeStandard deviation on Wikipedia. VAR_SAMP(numeric_expr)- Computes thesample variance of the values computed by
numeric_expr. For more information about population versus sample standard deviation, seeStandard deviation on Wikipedia.
TOP() function
TOP is a function that is an alternative to the GROUP BY clause. It is used as simplified syntax forGROUP BY ... ORDER BY ... LIMIT .... Generally, the TOP function performs faster than the full... GROUP BY ... ORDER BY ... LIMIT ... query, but may only return approximate results. The following is the syntax for the TOP function:
TOP(field|alias[,max_values][,multiplier]) ... COUNT(*)
When using TOP in aSELECT clause, you must includeCOUNT(*) as one of the fields.
A query that uses the TOP() function can return only two fields: the TOP field, and the COUNT(*) value.
field|alias- The field or alias to return.
max_values- [Optional] The maximum number of results to return. Default is 20.
multiplier- A positive integer that increases the value(s) returned by
COUNT(*)by the multiple specified.
TOP() examples
Basic example queries that use
TOP()The following queries use
TOP()to return 10 rows.Example 1:
#legacySQLSELECTTOP(word,10)asword,COUNT(*)ascntFROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS"th";
Example 2:
#legacySQLSELECTword,left(word,3)FROM(SELECTTOP(word,10)ASword,COUNT(*)FROM[bigquery-public-data:samples.shakespeare]WHEREwordCONTAINS"th");
Compare
TOP()toGROUP BY...ORDER BY...LIMITThe query returns, in order, the top 10 most frequently used words containing"th", and the number of documents the words was used in. The
TOPquery will execute much faster:Example without
TOP():#legacySQLSELECTword,COUNT(*)AScntFROMds.TableWHEREwordCONTAINS'th'GROUPBYwordORDERBYcntDESCLIMIT10;
Example with
TOP():#legacySQLSELECTTOP(word,10),COUNT(*)FROMds.TableWHEREwordcontains'th';
Using the
multiplierparameter.The following queries show how the
multiplierparameter affects the query result. The first query returns the number of births per month in Wyoming. The second query uses tomultiplierparameter to multiply thecntvalues by 100.Example without the
multiplierparameter:#legacySQLSELECTTOP(month,3)asmonth,COUNT(*)ascntFROM[bigquery-public-data:samples.natality]WHEREstate="WY";
Returns:
+-------+-------+| month | cnt |+-------+-------+| 7 | 19594 || 5 | 19038 || 8 | 19030 |+-------+-------+
Example with the
multiplierparameter:#legacySQLSELECTTOP(month,3,100)asmonth,COUNT(*)ascntFROM[bigquery-public-data:samples.natality]WHEREstate="WY";
Returns:
+-------+---------+| month | cnt |+-------+---------+| 7 | 1959400 || 5 | 1903800 || 8 | 1903000 |+-------+---------+
Note: You must includeCOUNT(*) in theSELECT clause to useTOP.
Advanced examples
Average and standard deviation grouped by condition
The following query returns the average and standard deviation of birth weights in Ohio in 2003, grouped by mothers who do and do not smoke.
Example:
#legacySQLSELECTcigarette_use,/* Finds average and standard deviation */AVG(weight_pounds)baby_weight,STDDEV(weight_pounds)baby_weight_stdev,AVG(mother_age)mother_ageFROM[bigquery-public-data:samples.natality]WHEREyear=2003ANDstate='OH'/* Group the result values by those *//* who smoked and those who didn't. */GROUPBYcigarette_use;
Filter query results using an aggregated value
In order to filter query results using an aggregated value (for example, filtering by the value of a
SUM), use theHAVINGfunction.HAVINGcompares a value to a result determined by an aggregation function, as opposed toWHERE, which operates on each row prior to aggregation.Example:
#legacySQLSELECTstate,/* If 'is_male' is True, return 'Male', *//* otherwise return 'Female' */IF(is_male,'Male','Female')ASsex,/* The count value is aliased as 'cnt' *//* and used in the HAVING clause below. */COUNT(*)AScntFROM[bigquery-public-data:samples.natality]WHEREstate!=''GROUPBYstate,sexHAVINGcnt >3000000ORDERBYcntDESC
Returns:
+-------+--------+---------+| state | sex | cnt |+-------+--------+---------+| CA | Male | 7060826 || CA | Female | 6733288 || TX | Male | 5107542 || TX | Female | 4879247 || NY | Male | 4442246 || NY | Female | 4227891 || IL | Male | 3089555 |+-------+--------+---------+
Arithmetic operators
Arithmetic operators take numeric arguments and return a numeric result. Each argument can be a numeric literal or a numeric value returned by a query. If the arithmetic operation evaluates to an undefined result, the operation returnsNULL.
Syntax
| Operator | Description | Example |
|---|---|---|
| + | Addition |
Returns:10 |
| - | Subtraction |
Returns:1 |
| * | Multiplication |
Returns:24 |
| / | Division |
Returns:1.5 |
| % | Modulo |
Returns:2 |
Bitwise functions
Bitwise functions operate at the level of individual bits and require numerical arguments. For more information about bitwise functions, seeBitwise operation.
Three additional bitwise functions,BIT_AND,BIT_OR andBIT_XOR, are documented inaggregate functions.
Syntax
| Operator | Description | Example |
|---|---|---|
| & | Bitwise AND |
Returns:0 |
| | | Bitwise OR |
Returns:28 |
| ^ | Bitwise XOR |
Returns:1 |
| << | Bitwise shift left |
Returns:16 |
| >> | Bitwise shift right |
Returns:2 |
| ~ | Bitwise NOT |
Returns:-3 |
BIT_COUNT(<numeric_expr>) | Returns the number of bits that are set in |
Returns:4 |
Casting functions
Casting functions change the data type of a numeric expression. Casting functions are particularly useful for ensuring that arguments in a comparison function have the same data type.
Syntax
| Casting functions | |
|---|---|
BOOLEAN() | Cast to boolean. |
BYTES() | Cast to bytes. |
CAST(expr AS type) | Convertsexpr into a variable of typetype. |
FLOAT() | Cast to double. |
HEX_STRING() | Cast to hexadecimal string. |
INTEGER() | Cast to integer. |
STRING() | Cast to string. |
BOOLEAN(<numeric_expr>)- Returns
trueif<numeric_expr>is not 0 and not NULL. - Returns
falseif<numeric_expr>is 0. - Returns
NULLif<numeric_expr>is NULL.
- Returns
BYTES(string_expr)- Returns
string_expras a value of typebytes. CAST(expr AStype)- Converts
exprinto a variable of typetype. FLOAT(expr)- Returns
expras a double. Theexprcan be a string like'45.78', but the function returnsNULLfor non-numeric values. HEX_STRING(numeric_expr)- Returns
numeric_expras a hexadecimal string. INTEGER(expr)- Casts
exprto a 64-bit integer.- Returns NULL if
expris a string that doesn't correspond to an integer value. - Returns the number of microseconds since the unix epoch if
expris a timestamp.
- Returns NULL if
STRING(numeric_expr)- Returns
numeric_expras a string.
Comparison functions
Comparison functions returntrue orfalse, based on the following types of comparisons:
- A comparison of two expressions.
- A comparison of an expression or set of expressions to a specific criteria, such as being in a specified list, being NULL, or being a non-default optional value.
Some of the functions listed below return values other thantrue orfalse, but the values they return are based on comparison operations.
You can use either numeric or string expressions as arguments for comparison functions. (String constants must be enclosed in single or double quotes.) The expressions can be literals or values fetched by a query. Comparison functions are most often used as filtering conditions inWHERE clauses, but they can be used in other clauses.
Syntax
| Comparison functions | |
|---|---|
expr1 =expr2 | Returnstrue if the expressions are equal. |
expr1 !=expr2expr1 <>expr2 | Returnstrue if the expressions are not equal. |
expr1 >expr2 | Returnstrue ifexpr1 is greater thanexpr2. |
expr1 <expr2 | Returnstrue ifexpr1 is less thanexpr2. |
expr1 >=expr2 | Returnstrue ifexpr1 is greater than or equal toexpr2. |
expr1 <=expr2 | Returnstrue ifexpr1 is less than or equal toexpr2. |
expr1 BETWEENexpr2 ANDexpr3 | Returnstrue if the value ofexpr1 is betweenexpr2 andexpr3, inclusive. |
expr IS NULL | Returnstrue ifexpr is NULL. |
expr IN() | Returnstrue ifexpr matchesexpr1,expr2, or any value in the parentheses. |
COALESCE() | Returns the first argument that isn't NULL. |
GREATEST() | Returns the largestnumeric_expr parameter. |
IFNULL() | If argument is not null, returns the argument. |
IS_INF() | Returnstrue if positive or negative infinity. |
IS_NAN() | Returnstrue if argument isNaN. |
IS_EXPLICITLY_DEFINED() | deprecated: Useexpr IS NOT NULL instead. |
LEAST() | Returns the smallest argumentnumeric_expr parameter. |
NVL() | Ifexpr is not null, returnsexpr, otherwise returnsnull_default. |
expr1 =expr2- Returns
trueif the expressions are equal. expr1 !=expr2expr1 <>expr2- Returns
trueif the expressions are not equal. expr1 >expr2- Returns
trueifexpr1is greater thanexpr2. expr1 <expr2- Returns
trueifexpr1is less thanexpr2. expr1 >=expr2- Returns
trueifexpr1is greater than or equal toexpr2. expr1 <=expr2- Returns
trueifexpr1is less than or equal toexpr2. expr1 BETWEENexpr2 ANDexpr3Returns
trueif the value ofexpr1is greater than or equal toexpr2, and less than or equal toexpr3.expr IS NULL- Returns
trueifexpris NULL. expr IN(expr1, expr2, ...)- Returns
trueifexprmatchesexpr1,expr2, or any value in the parentheses. TheINkeyword is an efficient shorthand for(expr =expr1 || expr =expr2 || ...). The expressions used with theINkeyword must be constants and they must match the data type ofexpr. TheINclause can also be used to create semi-joins and anti-joins. For more information, seeSemi-join and Anti-join. COALESCE(<expr1>,<expr2>, ...)- Returns the first argument that isn't NULL.
GREATEST(numeric_expr1,numeric_expr2, ...)Returns the largest
numeric_exprparameter. All parameters must be numeric, and all parameters must be the same type. If any parameter isNULL, this function returnsNULL.To ignore
NULLvalues, use theIFNULLfunction to changeNULLvalues to a value that doesn't affect the comparison. In the following code example, theIFNULLfunction is used to changeNULLvalues to-1, which doesn't affect the comparison between positive numbers.SELECTGREATEST(IFNULL(a,-1),IFNULL(b,-1))FROM(SELECT1asa,NULLasb);
IFNULL(expr,null_default)- If
expris not null, returnsexpr, otherwise returnsnull_default. IS_INF(numeric_expr)- Returns
trueifnumeric_expris positive or negative infinity. IS_NAN(numeric_expr)- Returns
trueifnumeric_expris the specialNaNnumeric value. IS_EXPLICITLY_DEFINED(expr)This function is deprecated. Use
expr IS NOT NULLinstead.LEAST(numeric_expr1,numeric_expr2, ...)Returns the smallest
numeric_exprparameter. All parameters must be numeric, and all parameters must be the same type. If any parameter isNULL, this function returnsNULLNVL(expr,null_default)- If
expris not null, returnsexpr, otherwise returnsnull_default. TheNVLfunction is an alias forIFNULL.
Date and time functions
The following functions enable date and time manipulation for UNIX timestamps, date strings and TIMESTAMP data types. For more information about working with the TIMESTAMP data type, seeUsing TIMESTAMP.
Date and time functions that work with UNIX timestamps operate onUNIX time. Date and time functions return values based upon the UTC time zone.
Syntax
| Date and time functions | |
|---|---|
CURRENT_DATE() | Returns current date in the format%Y-%m-%d. |
CURRENT_TIME() | Returns the server's current time in the format%H:%M:%S. |
CURRENT_TIMESTAMP() | Returns the server's current time in the format%Y-%m-%d %H:%M:%S. |
DATE() | Returns the date in the format%Y-%m-%d. |
DATE_ADD() | Adds the specified interval to a TIMESTAMP data type. |
DATEDIFF() | Returns the number of days between two TIMESTAMP data types. |
DAY() | Returns the day of the month as an integer between 1 and 31. |
DAYOFWEEK() | Returns the day of the week as an integer between 1 (Sunday) and 7 (Saturday). |
DAYOFYEAR() | Returns the day of the year as an integer between 1 and 366. |
FORMAT_UTC_USEC() | Returns a UNIX timestamp in the formatYYYY-MM-DD HH:MM:SS.uuuuuu. |
HOUR() | Returns the hour of a TIMESTAMP as an integer between 0 and 23. |
MINUTE() | Returns the minutes of a TIMESTAMP as an integer between 0 and 59. |
MONTH() | Returns the month of a TIMESTAMP as an integer between 1 and 12. |
MSEC_TO_TIMESTAMP() | Converts a UNIX timestamp in milliseconds to a TIMESTAMP. |
NOW() | Returns the current UNIX timestamp in microseconds. |
PARSE_UTC_USEC() | Converts a date string to a UNIX timestamp in microseconds. |
QUARTER() | Returns the quarter of the year of a TIMESTAMP as an integer between 1 and 4. |
SEC_TO_TIMESTAMP() | Converts a UNIX timestamp in seconds to a TIMESTAMP. |
SECOND() | Returns the seconds of a TIMESTAMP as an integer between 0 and 59. |
STRFTIME_UTC_USEC() | Returns a date string in the formatdate_format_str. |
TIME() | Returns a TIMESTAMP in the format%H:%M:%S. |
TIMESTAMP() | Convert a date string to a TIMESTAMP. |
TIMESTAMP_TO_MSEC() | Converts a TIMESTAMP to a UNIX timestamp in milliseconds. |
TIMESTAMP_TO_SEC() | Converts a TIMESTAMP to a UNIX timestamp in seconds. |
TIMESTAMP_TO_USEC() | Converts a TIMESTAMP to a UNIX timestamp in microseconds. |
USEC_TO_TIMESTAMP() | Converts a UNIX timestamp in microseconds to a TIMESTAMP. |
UTC_USEC_TO_DAY() | Shifts a UNIX timestamp in microseconds to the beginning of the day it occurs in. |
UTC_USEC_TO_HOUR() | Shifts a UNIX timestamp in microseconds to the beginning of the hour it occurs in. |
UTC_USEC_TO_MONTH() | Shifts a UNIX timestamp in microseconds to the beginning of the month it occurs in. |
UTC_USEC_TO_WEEK() | Returns a UNIX timestamp in microseconds that represents a day in the week. |
UTC_USEC_TO_YEAR() | Returns a UNIX timestamp in microseconds that represents the year. |
WEEK() | Returns the week of a TIMESTAMP as an integer between 1 and 53. |
YEAR() | Returns the year of a TIMESTAMP. |
CURRENT_DATE()Returns a human-readable string of the current date in the format
%Y-%m-%d.Example:
SELECT CURRENT_DATE();Returns:2013-02-01
CURRENT_TIME()Returns a human-readable string of the server's current time in the format
%H:%M:%S.Example:
SELECT CURRENT_TIME();Returns:01:32:56
CURRENT_TIMESTAMP()Returns a TIMESTAMP data type of the server's current time in the format
%Y-%m-%d %H:%M:%S.Example:
SELECT CURRENT_TIMESTAMP();Returns:2013-02-01 01:33:35 UTC
DATE(<timestamp>)Returns a human-readable string of a TIMESTAMP data type in the format
%Y-%m-%d.Example:
SELECT DATE(TIMESTAMP('2012-10-01 02:03:04'));Returns:2012-10-01
DATE_ADD(<timestamp>,<interval>,
<interval_units>)Adds the specified interval to a TIMESTAMP data type. Possible
interval_unitsvalues includeYEAR,MONTH,DAY,HOUR,MINUTE, andSECOND. Ifintervalis a negative number, the interval is subtracted from the TIMESTAMP data type.Example:
SELECT DATE_ADD(TIMESTAMP("2012-10-01 02:03:04"), 5, "YEAR");Returns:2017-10-01 02:03:04 UTC
SELECT DATE_ADD(TIMESTAMP("2012-10-01 02:03:04"), -5, "YEAR");Returns:2007-10-01 02:03:04 UTC
DATEDIFF(<timestamp1>,<timestamp2>)Returns the number of days between two TIMESTAMP data types. The result is positive if the first TIMESTAMP data type comes after the second TIMESTAMP data type, and otherwise the result is negative.
Example:
SELECT DATEDIFF(TIMESTAMP('2012-10-02 05:23:48'), TIMESTAMP('2011-06-24 12:18:35'));Returns:466
Example:
SELECT DATEDIFF(TIMESTAMP('2011-06-24 12:18:35'), TIMESTAMP('2012-10-02 05:23:48'));Returns:-466
DAY(<timestamp>)Returns the day of the month of a TIMESTAMP data type as an integer between 1 and 31, inclusively.
Example:
SELECT DAY(TIMESTAMP('2012-10-02 05:23:48'));Returns:2
DAYOFWEEK(<timestamp>)Returns the day of the week of a TIMESTAMP data type as an integer between 1 (Sunday) and 7 (Saturday), inclusively.
Example:
SELECT DAYOFWEEK(TIMESTAMP("2012-10-01 02:03:04"));Returns:2
DAYOFYEAR(<timestamp>)Returns the day of the year of a TIMESTAMP data type as an integer between 1 and 366, inclusively. The integer 1 refers to January 1.
Example:
SELECT DAYOFYEAR(TIMESTAMP("2012-10-01 02:03:04"));Returns:275
FORMAT_UTC_USEC(<unix_timestamp>)Returns a human-readable string representation of a UNIX timestamp in the format
YYYY-MM-DD HH:MM:SS.uuuuuu.Example:
SELECT FORMAT_UTC_USEC(1274259481071200);Returns:2010-05-19 08:58:01.071200
HOUR(<timestamp>)Returns the hour of a TIMESTAMP data type as an integer between 0 and 23, inclusively.
Example:
SELECT HOUR(TIMESTAMP('2012-10-02 05:23:48'));Returns:5
MINUTE(<timestamp>)Returns the minutes of a TIMESTAMP data type as an integer between 0 and 59, inclusively.
Example:
SELECT MINUTE(TIMESTAMP('2012-10-02 05:23:48'));Returns:23
MONTH(<timestamp>)Returns the month of a TIMESTAMP data type as an integer between 1 and 12, inclusively.
Example:
SELECT MONTH(TIMESTAMP('2012-10-02 05:23:48'));Returns:10
MSEC_TO_TIMESTAMP(<expr>)- Converts a UNIX timestamp in milliseconds to a TIMESTAMP data type.
Example:
SELECT MSEC_TO_TIMESTAMP(1349053323000);Returns:2012-10-01 01:02:03 UTC
SELECT MSEC_TO_TIMESTAMP(1349053323000 + 1000)Returns:2012-10-01 01:02:04 UTC
NOW()Returns the current UNIX timestamp in microseconds.
Example:
SELECT NOW();Returns:1359685811687920
PARSE_UTC_USEC(<date_string>)Converts a date string to a UNIX timestamp in microseconds.
date_stringmust have the formatYYYY-MM-DD HH:MM:SS[.uuuuuu]. The fractional part of the second can be up to 6 digits long or can be omitted.TIMESTAMP_TO_USEC is an equivalent function that converts a TIMESTAMP data type argument instead of a date string.
Example:
SELECT PARSE_UTC_USEC("2012-10-01 02:03:04");Returns:1349056984000000
QUARTER(<timestamp>)Returns the quarter of the year of a TIMESTAMP data type as an integer between 1 and 4, inclusively.
Example:
SELECT QUARTER(TIMESTAMP("2012-10-01 02:03:04"));Returns:4
SEC_TO_TIMESTAMP(<expr>)Converts a UNIX timestamp in seconds to a TIMESTAMP data type.
Example:
SELECT SEC_TO_TIMESTAMP(1355968987);Returns:2012-12-20 02:03:07 UTC
SELECT SEC_TO_TIMESTAMP(INTEGER(1355968984 + 3));Returns:2012-12-20 02:03:07 UTC
SECOND(<timestamp>)Returns the seconds of a TIMESTAMP data type as an integer between 0 and 59, inclusively.
During aleap second, the integer range is between 0 and 60, inclusively.
Example:
SELECT SECOND(TIMESTAMP('2012-10-02 05:23:48'));Returns:48
STRFTIME_UTC_USEC(<unix_timestamp>,
<date_format_str>)Returns a human-readable date string in the formatdate_format_str.date_format_str can include date-related punctuation characters (such as/ and-) and special characters accepted by thestrftime function in C++ (such as%d for day of month).
Use the
UTC_USEC_TO_<function_name>functions if you plan to group query data by time intervals, such as getting all data for a certain month, because the functions are more efficient.Example:
SELECT STRFTIME_UTC_USEC(1274259481071200, "%Y-%m-%d");Returns:2010-05-19
TIME(<timestamp>)Returns a human-readable string of a TIMESTAMP data type, in the format
%H:%M:%S.Example:
SELECT TIME(TIMESTAMP('2012-10-01 02:03:04'));Returns:02:03:04
TIMESTAMP(<date_string>)Convert a date string to a TIMESTAMP data type.
Example:
SELECT TIMESTAMP("2012-10-01 01:02:03");Returns:2012-10-01 01:02:03 UTC
TIMESTAMP_TO_MSEC(<timestamp>)Converts a TIMESTAMP data type to a UNIX timestamp in milliseconds.
Example:
SELECT TIMESTAMP_TO_MSEC(TIMESTAMP("2012-10-01 01:02:03"));Returns:1349053323000
TIMESTAMP_TO_SEC(<timestamp>)- Converts a TIMESTAMP data type to a UNIX timestamp in seconds.
Example:
SELECT TIMESTAMP_TO_SEC(TIMESTAMP("2012-10-01 01:02:03"));Returns:1349053323
TIMESTAMP_TO_USEC(<timestamp>)Converts a TIMESTAMP data type to a UNIX timestamp in microseconds.
PARSE_UTC_USEC is an equivalent function that converts a data string argument instead of a TIMESTAMP data type.
Example:
SELECT TIMESTAMP_TO_USEC(TIMESTAMP("2012-10-01 01:02:03"));Returns:1349053323000000
USEC_TO_TIMESTAMP(<expr>)Converts a UNIX timestamp in microseconds to a TIMESTAMP data type.
Example:
SELECT USEC_TO_TIMESTAMP(1349053323000000);Returns:2012-10-01 01:02:03 UTC
SELECT USEC_TO_TIMESTAMP(1349053323000000 + 1000000)Returns:2012-10-01 01:02:04 UTC
UTC_USEC_TO_DAY(<unix_timestamp>)Shifts a UNIX timestamp in microseconds to the beginning of the day it occurs in.
For example, if
unix_timestampoccurs on May 19th at 08:58, this function returns a UNIX timestamp for May 19th at 00:00 (midnight).Example:
SELECT UTC_USEC_TO_DAY(1274259481071200);Returns:1274227200000000
UTC_USEC_TO_HOUR(<unix_timestamp>)Shifts a UNIX timestamp in microseconds to the beginning of the hour it occurs in.
For example, if
unix_timestampoccurs at 08:58, this function returns a UNIX timestamp for 08:00 on the same day.Example:
SELECT UTC_USEC_TO_HOUR(1274259481071200);Returns:1274256000000000
UTC_USEC_TO_MONTH(<unix_timestamp>)Shifts a UNIX timestamp in microseconds to the beginning of the month it occurs in.
For example, if
unix_timestampoccurs on March 19th, this function returns a UNIX timestamp for March 1st of the same year.Example:
SELECT UTC_USEC_TO_MONTH(1274259481071200);Returns:1272672000000000
UTC_USEC_TO_WEEK(<unix_timestamp>,
<day_of_week>)Returns a UNIX timestamp in microseconds that represents a day in the week of the
unix_timestampargument. This function takes two arguments: a UNIX timestamp in microseconds, and a day of the week from 0 (Sunday) to 6 (Saturday).For example, if
unix_timestampoccurs on Friday, 2008-04-11, and you setday_of_weekto 2 (Tuesday), the function returns a UNIX timestamp for Tuesday, 2008-04-08.Example:
SELECT UTC_USEC_TO_WEEK(1207929480000000, 2) AS tuesday;Returns:1207612800000000
UTC_USEC_TO_YEAR(<unix_timestamp>)Returns a UNIX timestamp in microseconds that represents the year of the
unix_timestampargument.For example, if
unix_timestampoccurs in 2010, the function returns1274259481071200, the microsecond representation of2010-01-01 00:00.Example:
SELECT UTC_USEC_TO_YEAR(1274259481071200);Returns:1262304000000000
WEEK(<timestamp>)Returns the week of a TIMESTAMP data type as an integer between 1 and 53, inclusively.
Weeks begin on Sunday, so if January 1 is on a day other than Sunday, week 1 has fewer than 7 days and the first Sunday of the year is the first day of week 2.
Example:
SELECT WEEK(TIMESTAMP('2014-12-31'));Returns:53
YEAR(<timestamp>)- Returns the year of a TIMESTAMP data type.
Example:
SELECT YEAR(TIMESTAMP('2012-10-02 05:23:48'));Returns:2012
Advanced examples
Convert integer timestamp results into human-readable format
The following query finds the top 5 moments in time in which the most Wikipedia revisions took place. In order to display results in a human-readableformat, use BigQuery's
FORMAT_UTC_USEC()function, which takes a timestamp, in microseconds, as an input. This query multiplies the Wikipedia POSIX format timestamps (in seconds) by 1000000 to convert the value into microseconds.Example:
#legacySQLSELECT/* Multiply timestamp by 1000000 and convert *//* into a more human-readable format. */TOP(FORMAT_UTC_USEC(timestamp*1000000),5)AStop_revision_time,COUNT(*)ASrevision_countFROM[bigquery-public-data:samples.wikipedia];
Returns:
+----------------------------+----------------+| top_revision_time | revision_count |+----------------------------+----------------+| 2002-02-25 15:51:15.000000 | 20976 || 2002-02-25 15:43:11.000000 | 15974 || 2010-02-02 03:34:51.000000 | 3 || 2010-02-02 01:04:59.000000 | 3 || 2010-02-01 23:55:05.000000 | 3 |+----------------------------+----------------+
Bucketing Results by Timestamp
It's useful to use date and time functions to group query results into buckets corresponding to particular years, months, or days. The following example uses the
UTC_USEC_TO_MONTH()function to display how many characters each Wikipedia contributor uses in their revision comments per month.Example:
#legacySQLSELECTcontributor_username,/* Return the timestamp shifted to the * start of the month, formatted in * a human-readable format. Uses the * 'LEFT()' string function to return only * the first 7 characters of the formatted timestamp. */LEFT(FORMAT_UTC_USEC(UTC_USEC_TO_MONTH(timestamp*1000000)),7)ASmonth,SUM(LENGTH(comment))astotal_chars_usedFROM[bigquery-public-data:samples.wikipedia]WHERE(contributor_username!=''ANDcontributor_usernameISNOTNULL)ANDtimestamp >1133395200ANDtimestamp <1157068800GROUPBYcontributor_username,monthORDERBYtotal_chars_usedDESC;
Returns (truncated):
+--------------------------------+---------+-----------------------+| contributor_username | month | total_chars_used |+--------------------------------+---------+-----------------------+| Kingbotk | 2006-08 | 18015066 || SmackBot | 2006-03 | 7838365 || SmackBot | 2006-05 | 5148863 || Tawkerbot2 | 2006-05 | 4434348 || Cydebot | 2006-06 | 3380577 |etc ...
IP functions
IP functions convert IP addresses to and from human-readable form.
Syntax
| IP functions | |
|---|---|
FORMAT_IP() | Converts 32 least significant bits ofinteger_value to human-readable IPv4 address string. |
PARSE_IP() | Converts a string representing IPv4 address to unsigned integer value. |
FORMAT_PACKED_IP() | Returns a human-readable IP address in the form10.1.5.23 or2620:0:1009:1:216:36ff:feef:3f. |
PARSE_PACKED_IP() | Returns an IP address inBYTES. |
FORMAT_IP(integer_value)- Converts 32 least significant bits of
integer_valueto human-readable IPv4 address string. For example,FORMAT_IP(1)will return string'0.0.0.1'. PARSE_IP(readable_ip)- Converts a string representing IPv4 address to unsigned integer value. For example,
PARSE_IP('0.0.0.1')will return1. If string is not a valid IPv4 address,PARSE_IPwill returnNULL.
BigQuery supports writing IPv4 and IPv6 addresses in packed strings, as4- or 16-byte binary data in network byte order. The functions described belowsupport parsing the addresses to and from human readable form. These functionswork only on string fields with IPs.
Syntax
FORMAT_PACKED_IP(packed_ip)Returns a human-readable IP address, in the form
10.1.5.23or2620:0:1009:1:216:36ff:feef:3f.Examples:FORMAT_PACKED_IP('0123456789@ABCDE')returns'3031:3233:3435:3637:3839:4041:4243:4445'FORMAT_PACKED_IP('0123')returns'48.49.50.51'
PARSE_PACKED_IP(readable_ip)Returns an IP address inBYTES. If the input string is not a valid IPv4 or IPv6 address,
PARSE_PACKED_IPwill returnNULL.Examples:PARSE_PACKED_IP('48.49.50.51')returns'MDEyMw=='PARSE_PACKED_IP('3031:3233:3435:3637:3839:4041:4243:4445')returns'MDEyMzQ1Njc4OUBBQkNERQ=='
JSON functions
BigQuery's JSON functions give you the ability to find values within your stored JSON data, by usingJSONPath-like expressions.
Storing JSON data can be more flexible than declaring all of your individual fields in your table schema, but can lead to higher costs. When you select data from a JSON string, you are charged for scanning the entire string, which is more expensive than if each field is in a separate column. The query is also slower since the entire string needs to be parsed at query time. But for ad-hoc or rapidly-changing schemas, the flexibility of JSON can be worth the extra cost.
Use JSON functions instead of BigQuery'sregular expression functions if working with structured data, as JSON functions are easier to use.
Syntax
| JSON functions | |
|---|---|
JSON_EXTRACT() | Selects a value according to the JSONPath expression and returns a JSON string. |
JSON_EXTRACT_SCALAR() | Selects a value according to the JSONPath expression and returns a JSON scalar. |
JSON_EXTRACT(json,json_path)Selects a value in
jsonaccording to the JSONPath expressionjson_path.json_pathmust be a string constant. Returns the value in JSON string format.JSON_EXTRACT_SCALAR(json,json_path)Selects a value in
jsonaccording to the JSONPath expressionjson_path.json_pathmust be a string constant. Returns a scalar JSON value.
Logical operators
Logical operators perform binary or ternary logic on expressions. Binary logic returnstrue orfalse. Ternary logic accommodatesNULL values and returnstrue,false, orNULL.
Syntax
| Logical operators | |
|---|---|
expr ANDexpr | Returnstrue if both expressions are true. |
expr ORexpr | Returnstrue if one or both expressions are true. |
NOTexpr | Returnstrue if the expression is false. |
expr ANDexpr- Returns
trueif both expressions are true. - Returns
falseif one or both of the expressions are false. - Returns
NULLif both expressions are NULL or one expression is true and the other is NULL.
- Returns
expr ORexpr- Returns
trueif one or both expressions are true. - Returns
falseif both expressions are false. - Returns
NULLif both expressions are NULL or one expression is false and the other is NULL.
- Returns
NOTexpr- Returns
trueif the expression is false. - Returns
falseif the expression if true. - Returns
NULLif the expression is NULL.
You can use
NOTwith other functions as an negation operator. For example,NOT IN(expr1, expr2)orIS NOT NULL.- Returns
Mathematical functions
Mathematical functions take numeric arguments and return a numeric result. Each argument can be a numeric literal or a numeric value returned by a query. If the mathematical function evaluates to an undefined result, the operation returnsNULL.
Syntax
| Mathematical functions | |
|---|---|
ABS() | Returns the absolute value of the argument. |
ACOS() | Returns the arc cosine of the argument. |
ACOSH() | Returns the arc hyperbolic cosine of the argument. |
ASIN() | Returns the arc sine of the argument. |
ASINH() | Returns the arc hyperbolic sine of the argument. |
ATAN() | Returns the arc tangent of the argument. |
ATANH() | Returns the arc hyperbolic tangent of the argument. |
ATAN2() | Returns the arc tangent of the two arguments. |
CEIL() | Rounds the argument up to the nearest whole number and returns the rounded value. |
COS() | Returns the cosine of the argument. |
COSH() | Returns the hyperbolic cosine of the argument. |
DEGREES() | Converts from radians to degrees. |
EXP() | Returnse to the power of the argument. |
FLOOR() | Rounds the argument down to the nearest whole number. |
LN()LOG() | Returns the natural logarithm of the argument. |
LOG2() | Returns the Base-2 logarithm of the argument. |
LOG10() | Returns the Base-10 logarithm of the argument. |
PI() | Returns the constant π. |
POW() | Returns first argument to the power of the second argument. |
RADIANS() | Converts from degrees to radians. |
RAND() | Returns a random float value in the range 0.0 <= value < 1.0. |
ROUND() | Rounds the argument either up or down to the nearest whole number. |
SIN() | Returns the sine of the argument. |
SINH() | Returns the hyperbolic sine of the argument. |
SQRT() | Returns the square root of the expression. |
TAN() | Returns the tangent of the argument. |
TANH() | Returns the hyperbolic tangent of the argument. |
ABS(numeric_expr)- Returns the absolute value of the argument.
ACOS(numeric_expr)- Returns the arc cosine of the argument.
ACOSH(numeric_expr)- Returns the arc hyperbolic cosine of the argument.
ASIN(numeric_expr)- Returns the arc sine of the argument.
ASINH(numeric_expr)- Returns the arc hyperbolic sine of the argument.
ATAN(numeric_expr)- Returns the arc tangent of the argument.
ATANH(numeric_expr)- Returns the arc hyperbolic tangent of the argument.
ATAN2(numeric_expr1,numeric_expr2)- Returns the arc tangent of the two arguments.
CEIL(numeric_expr)- Rounds the argument up to the nearest whole number and returns the rounded value.
COS(numeric_expr)- Returns the cosine of the argument.
COSH(numeric_expr)- Returns the hyperbolic cosine of the argument.
DEGREES(numeric_expr)- Returns
numeric_expr, converted from radians to degrees. EXP(numeric_expr)- Returns the result of raising the constant "e" - the base of the natural logarithm - to the power ofnumeric_expr.
FLOOR(numeric_expr)- Rounds the argument down to the nearest whole number and returns the rounded value.
LN(numeric_expr)LOG(numeric_expr)- Returns the natural logarithm of the argument.
LOG2(numeric_expr)- Returns the Base-2 logarithm of the argument.
LOG10(numeric_expr)- Returns the Base-10 logarithm of the argument.
PI()- Returns the constant π. The
PI()function requires parentheses to signify that it is a function, but takes no arguments in those parentheses. You can usePI()like a constant with mathematical and arithmetic functions. POW(numeric_expr1,numeric_expr2)- Returns the result of raising
numeric_expr1to the power ofnumeric_expr2. RADIANS(numeric_expr)- Returns
numeric_expr, converted from degrees to radians. (Note that π radians equals 180 degrees.) RAND([int32_seed])- Returns a random float value in the range 0.0 <= value < 1.0. Each
int32_seedvalue always generates the same sequence of random numbers within a given query, as long as you don't use aLIMITclause. Ifint32_seedis not specified, BigQuery uses the current timestamp as the seed value. ROUND(numeric_expr [,digits])- Rounds the argument either up or down to the nearest whole number (or if specified, to the specified number of digits) and returns the rounded value.
SIN(numeric_expr)- Returns the sine of the argument.
SINH(numeric_expr)- Returns the hyperbolic sine of the argument.
SQRT(numeric_expr)- Returns the square root of the expression.
TAN(numeric_expr)- Returns the tangent of the argument.
TANH(numeric_expr)- Returns the hyperbolic tangent of the argument.
Advanced examples
Bounding box query
The following query returns a collection of points within a rectangular bounding box centered around San Francisco (37.46, -122.50).
Example:
#legacySQLSELECTyear,month,AVG(mean_temp)avg_temp,MIN(min_temperature)min_temp,MAX(max_temperature)max_tempFROM[weather_geo.table]WHERE/* Return values between a pair of *//* latitude and longitude coordinates */lat/1000 >37.46ANDlat/1000 <37.65ANDlong/1000 >-122.50ANDlong/1000 <-122.30GROUPBYyear,monthORDERBYyear,monthASC;
Approximate Bounding Circle Query
Return a collection of up to 100 points within an approximated circle determined by the using theSpherical Law of Cosines, centered around Denver Colorado (39.73, -104.98). This query makes use of BigQuery's mathematical and trigonometric functions, such as
PI(),SIN(), andCOS().Because the Earth isn't an absolute sphere, and longitude+latitude converges at the poles, this query returns an approximation that can be useful for many types of data.
Example:
#legacySQLSELECTdistance,lat,long,tempFROM(SELECT((ACOS(SIN(39.73756700*PI()/180)*SIN((lat/1000)*PI()/180)+COS(39.73756700*PI()/180)*COS((lat/1000)*PI()/180)*COS((-104.98471790-(long/1000))*PI()/180))*180/PI())*60*1.1515)ASdistance,AVG(mean_temp)AStemp,AVG(lat/1000)lat,AVG(long/1000)longFROM[weather_geo.table]WHEREmonth=1GROUPBYdistance)WHEREdistance <100ORDERBYdistanceASCLIMIT100;
Regular expression functions
BigQuery provides regular expression support using there2 library;see that documentation for itsregular expression syntax.
Note that the regular expressions are global matches; to start matching at the beginning of a word you must use the ^ character.
Syntax
| Regular expression functions | |
|---|---|
REGEXP_MATCH() | Returns true if the argument matches the regular expression. |
REGEXP_EXTRACT() | Returns the portion of the argument that matches the capturing group within the regular expression. |
REGEXP_REPLACE() | Replaces a substring that matches a regular expression. |
REGEXP_MATCH('str','reg_exp')Returns true ifstr matches the regular expression. For string matching without regular expressions, useCONTAINS instead of REGEXP_MATCH.
Example:
#legacySQLSELECTword,COUNT(word)AScountFROM[bigquery-public-data:samples.shakespeare]WHERE(REGEXP_MATCH(word,r'\w\w\'\w\w'))GROUPBYwordORDERBYcountDESCLIMIT3;
Returns:
+-------+-------+| word | count |+-------+-------+| ne'er | 42 || we'll | 35 || We'll | 33 |+-------+-------+
REGEXP_EXTRACT('str', 'reg_exp')Returns the portion ofstr that matches the capturing group within the regular expression.
Example:
#legacySQLSELECTREGEXP_EXTRACT(word,r'(\w\w\'\w\w)')ASfragmentFROM[bigquery-public-data:samples.shakespeare]GROUPBYfragmentORDERBYfragmentLIMIT3;
Returns:
+----------+| fragment |+----------+| NULL || Al'ce || As'es |+----------+
REGEXP_REPLACE('orig_str', 'reg_exp', 'replace_str')Returns a string where any substring oforig_str that matchesreg_exp is replaced withreplace_str. For example, REGEXP_REPLACE ('Hello', 'lo', 'p') returns Help.
Example:
#legacySQLSELECTREGEXP_REPLACE(word,r'ne\'er', 'never') AS expanded_wordFROM [bigquery-public-data:samples.shakespeare]WHERE REGEXP_MATCH(word, r'ne\'er')GROUPBYexpanded_wordORDERBYexpanded_wordLIMIT5;
Returns:
+---------------+| expanded_word |+---------------+| Whenever || never || nevertheless || whenever |+---------------+
Advanced examples
Filter result set by regular expression match
BigQuery's regular expression functions can be used to filter results in a
WHEREclause, as well as to display results in theSELECT. The following example combines both of these regular expression use cases into a single query.Example:
#legacySQLSELECT/* Replace white spaces in the title with underscores. */REGEXP_REPLACE(title,r'\s+','_')ASregexp_title,revisionsFROM(SELECTtitle,COUNT(revision_id)asrevisionsFROM[bigquery-public-data:samples.wikipedia]WHEREwp_namespace=0/* Match titles that start with 'G', end with * 'e', and contain at least two 'o's. */ANDREGEXP_MATCH(title,r'^G.*o.*o.*e$')GROUPBYtitleORDERBYrevisionsDESCLIMIT100);
Using regular expressions on integer or float data
While BigQuery's regular expression functions only work for string data, it's possible to use the
STRING()function to cast integer or float data into string format. In this example,STRING()is used to cast the integer valuecorpus_dateto a string, which is then altered byREGEXP_REPLACE.Example:
#legacySQLSELECTcorpus_date,/* Cast the corpus_date to a string value */REGEXP_REPLACE(STRING(corpus_date),'^16','Written in the sixteen hundreds, in the year \'' ) AS date_stringFROM [bigquery-public-data:samples.shakespeare]/* Cast the corpus_date to string, *//* match values that begin with '16' */WHERE REGEXP_MATCH(STRING(corpus_date), '^16')GROUPBYcorpus_date,date_stringORDERBYdate_stringDESCLIMIT5;
String functions
String functions operate on string data. String constants must be enclosedwith single or double quotes. String functions are case-sensitive by default.You can appendIGNORE CASE to the end of a query to enable case-insensitive matching.IGNORE CASE works only on ASCII charactersand only at the top level of the query.
Wildcards are not supported in these functions; for regular expressionfunctionality, useregular expressionfunctions.
Syntax
| String functions | |
|---|---|
CONCAT() | Returns the concatenation of two or more strings, or NULL if any of the values are NULL. |
expr CONTAINS 'str' | Returnstrue ifexpr contains the specified string argument. |
INSTR() | Returns the one-based index of the first occurrence of a string. |
LEFT() | Returns the leftmost characters of a string. |
LENGTH() | Returns the length of the string. |
LOWER() | Returns the original string with all characters in lower case. |
LPAD() | Inserts characters to the left of a string. |
LTRIM() | Removes characters from the left side of a string. |
REPLACE() | Replaces all occurrences of a substring. |
RIGHT() | Returns the rightmost characters of a string. |
RPAD() | Inserts characters to the right side of a string. |
RTRIM() | Removes trailing characters from the right side of a string. |
SPLIT() | Splits a string into repeated substrings. |
SUBSTR() | Returns a substring ... |
UPPER() | Returns the original string with all characters in upper case. |
CONCAT('str1', 'str2', '...')
str1 +str2 + ...- Returns the concatenation of two or more strings, or NULL if any of the values are NULL.Example: if
str1isJavaandstr2isScript,CONCATreturnsJavaScript. expr CONTAINS 'str'- Returns
trueifexprcontains the specified string argument. This is a case-sensitive comparison. INSTR('str1', 'str2')- Returns the one-based index of the first occurrence ofstr2 instr1, or returns 0 ifstr2 does not occur instr1.
LEFT('str',numeric_expr)- Returns the leftmostnumeric_expr characters of
str. If the number is longer thanstr, the full string will be returned.Example:LEFT('seattle', 3)returnssea. LENGTH('str')- Returns a numerical value for the length of the string.Example: if
stris'123456',LENGTHreturns6. LOWER('str')- Returns the original string with all characters in lower case.
LPAD('str1',numeric_expr, 'str2')- Pads
str1on the left withstr2, repeatingstr2until the result string is exactlynumeric_exprcharacters.Example:LPAD('1', 7, '?')returns??????1. LTRIM('str1' [,str2])Removes characters from the left side ofstr1. Ifstr2 is omitted,
LTRIMremoves spaces from the left side ofstr1. Otherwise,LTRIMremoves any characters instr2 from the left side ofstr1 (case-sensitive).Examples:
SELECT LTRIM("Say hello", "yaS")returns" hello".SELECT LTRIM("Say hello", " ySa")returns"hello".REPLACE('str1', 'str2', 'str3')Replaces all instances ofstr2 withinstr1 withstr3.
RIGHT('str',numeric_expr)- Returns the rightmostnumeric_expr characters of
str. If the number is longer than the string, it will return the whole string.Example:RIGHT('kirkland', 4)returnsland. RPAD('str1',numeric_expr, 'str2')- Pads
str1on the right withstr2, repeatingstr2until the result string is exactlynumeric_exprcharacters.Example:RPAD('1', 7, '?')returns1??????. RTRIM('str1' [,str2])Removes trailing characters from the right side ofstr1. Ifstr2 is omitted,
RTRIMremoves trailing spaces fromstr1. Otherwise,RTRIMremoves any characters instr2 from the right side ofstr1 (case-sensitive).Examples:
SELECT RTRIM("Say hello", "leo")returns"Say h".SELECT RTRIM("Say hello ", " hloe")returns"Say".SPLIT('str' [, 'delimiter'])- Splits a string into repeated substrings. If
delimiteris specified, theSPLITfunction breaksstrinto substrings, usingdelimiteras the delimiter. SUBSTR('str',index [,max_len])- Returns a substring of
str, starting atindex. If the optionalmax_lenparameter is used, the returned string is a maximum ofmax_lencharacters long. Counting starts at 1, so the first character in the string is in position 1 (not zero). Ifindexis5, the substring begins with the 5th character from the left instr. Ifindexis-4, the substring begins with the 4th character from the right instr.Example:SUBSTR('awesome',-4,4)returns the substringsome. UPPER('str')- Returns the original string with all characters in upper case.
Escaping special characters in strings
To escape special characters, use one of the following methods:
- Use
'\xDD'notation, where'\x'is followed by the two-digit hex representation of the character. - Use an escaping slash in front of slashes, single quotes, and double quotes.
- Use C-style sequences (
'\a', '\b', '\f', '\n', '\r', '\t',and'\v') for other characters.
Some examples of escaping:
'this is a space: \x20''this string has \'single quote\' inside it''first line \n second line'"double quotes are also ok"'\070' ->ERROR: octal escaping is not supportedTable wildcard functions
Table wildcard functions are a convenient way to query data from a specificset of tables. A table wildcard function is equivalent to a comma-separatedunion of all the tables matched by the wildcard function. When you use a tablewildcard function, BigQuery only accesses and charges you for tables that matchthe wildcard. Table wildcard functions are specified in the query'sFROM clause.
If you use table wildcard functions in a query, the functions no longer needto be contained in parentheses. For example, some of the following examples useparentheses, whereas others don't.
Cached results are not supported for queries against multiple tablesusing a wildcard function (even if theUse Cached Results option is checked).If you run the same wildcard query multiple times, you are billed for each query.
Syntax
| Table wildcard functions | |
|---|---|
TABLE_DATE_RANGE() | Queries multiple daily tables that span a date range. |
TABLE_DATE_RANGE_STRICT() | Queries multiple daily tables that span a date range, with no missing dates. |
TABLE_QUERY() | Queries tables whose names match a specified predicate. |
TABLE_DATE_RANGE(prefix,timestamp1,timestamp2)Queries daily tables that overlap with the time range between
<timestamp1>and<timestamp2>.Table names must have the following format:
<prefix><day>, where<day>is in the formatYYYYMMDD.You can usedate and time functions to generate the timestamp parameters. For example:
TIMESTAMP('2012-10-01 02:03:04')DATE_ADD(CURRENT_TIMESTAMP(), -7, 'DAY')
Example: get tables between two days
This example assumes the following tables exist:
- mydata.people20140325
- mydata.people20140326
- mydata.people20140327
#legacySQLSELECTnameFROMTABLE_DATE_RANGE([myproject-1234:mydata.people],TIMESTAMP('2014-03-25'),TIMESTAMP('2014-03-27'))WHEREage>=35
Matches the following tables:
- mydata.people20140325
- mydata.people20140326
- mydata.people20140327
Example: get tables in a two-day range up to "now"
This example assumes the following tables exist in a project named
myproject-1234:- mydata.people20140323
- mydata.people20140324
- mydata.people20140325
#legacySQLSELECTnameFROM(TABLE_DATE_RANGE([myproject-1234:mydata.people],DATE_ADD(CURRENT_TIMESTAMP(),-2,'DAY'),CURRENT_TIMESTAMP()))WHEREage>=35
Matches the following tables:
- mydata.people20140323
- mydata.people20140324
- mydata.people20140325
TABLE_DATE_RANGE_STRICT(prefix,timestamp1,timestamp2)This function is equivalent to
TABLE_DATE_RANGE. The only difference is that if any daily table is missing in the sequence,TABLE_DATE_RANGE_STRICTfails and returns aNot Found: Table<table_name>error.Example: error on missing table
This example assumes the following tables exist:
- people20140325
- people20140327
#legacySQLSELECTnameFROM(TABLE_DATE_RANGE_STRICT([myproject-1234:mydata.people],TIMESTAMP('2014-03-25'),TIMESTAMP('2014-03-27')))WHEREage>=35
The above example returns an error "Not Found" for the table "people20140326".
TABLE_QUERY(dataset,expr)Queries tables whose names match the supplied
expr. Theexprparameter must be represented as a string and must contain an expression to evaluate. For example,'length(table_id) < 3'.Example: match tables whose names contain "oo" and have a length greater than 4
This example assumes the following tables exist:
- mydata.boo
- mydata.fork
- mydata.ooze
- mydata.spoon
#legacySQLSELECTspeedFROM(TABLE_QUERY([myproject-1234:mydata],'table_id CONTAINS "oo" AND length(table_id) >= 4'))
Matches the following tables:
- mydata.ooze
- mydata.spoon
Example: match tables whose names start with "boo", followed by 3-5 numeric digits
This example assumes the following tables exist in a project named
myproject-1234:- mydata.book4
- mydata.book418
- mydata.boom12345
- mydata.boom123456789
- mydata.taboo999
#legacySQLSELECTspeedFROMTABLE_QUERY([myproject-1234:mydata],'REGEXP_MATCH(table_id, r"^boo[\d]{3,5}")')
Matches the following tables:
- mydata.book418
- mydata.boom12345
URL functions
Syntax
| URL functions | |
|---|---|
HOST() | Given a URL, returns the host name as a string. |
DOMAIN() | Given a URL, returns the domain as a string. |
TLD() | Given a URL, returns the top level domain plus any country domain in the URL. |
HOST('url_str')- Given a URL, returns the host name as a string.Example: HOST('http://www.google.com:80/index.html') returns 'www.google.com'
DOMAIN('url_str')- Given a URL, returns the domain as a string.Example: DOMAIN('http://www.google.com:80/index.html') returns 'google.com'.
TLD('url_str')- Given a URL, returns the top level domain plus any country domain in the URL.Example: TLD('http://www.google.com:80/index.html') returns '.com'. TLD('http://www.google.co.uk:80/index.html') returns '.co.uk'.
Notes:
- These functions don't perform reverseDNS lookup, so if you call these functions using an IP address the functionswill return segments of the IP address rather than segments of the host name.
- All of the URL parsing functions expect lower-case characters. Upper-case charactersin the URL will result in a NULL or otherwise incorrect result. Consider passing input to thisfunction through LOWER() if your data has mixed casing.
Advanced example
Parse domain names from URL data
This query uses theDOMAIN() function to return the most popular domains listed as repository homepages on GitHub. Note the use of HAVING to filter records using the result of theDOMAIN() function. This is a useful function to determine referrer information from URL data.
Examples:
#legacySQLSELECTDOMAIN(repository_homepage)ASuser_domain,COUNT(*)ASactivity_countFROM[bigquery-public-data:samples.github_timeline]GROUPBYuser_domainHAVINGuser_domainISNOTNULLANDuser_domain!=''ORDERBYactivity_countDESCLIMIT5;
Returns:
+-----------------+----------------+| user_domain | activity_count |+-----------------+----------------+| github.com | 281879 || google.com | 34769 || khanacademy.org | 17316 || sourceforge.net | 15103 || mozilla.org | 14091 |+-----------------+----------------+
To look specifically at TLD information, use theTLD() function. Thisexample displays the top TLDs that are not in a list of common examples.
#legacySQLSELECTTLD(repository_homepage)ASuser_tld,COUNT(*)ASactivity_countFROM[bigquery-public-data:samples.github_timeline]GROUPBYuser_tldHAVING/* Only consider TLDs that are NOT NULL *//* or in our list of common TLDs */user_tldISNOTNULLANDNOTuser_tldIN('','.com','.net','.org','.info','.edu')ORDERBYactivity_countDESCLIMIT5;
Returns:
+----------+----------------+| user_tld | activity_count |+----------+----------------+| .de | 22934 || .io | 17528 || .me | 13652 || .fr | 12895 || .co.uk | 9135 |+----------+----------------+
Window functions
Window functions, also known as analytic functions, enable calculations on a specific subset, or "window", of a result set. Window functions make it easier to create reports that include complex analytics such as trailing averages and running totals.
Each window function requires anOVER clause that specifies the window top and bottom. The three components of theOVER clause (partitioning, ordering, and framing) provide additional control over the window. Partitioning enables you to divide the input data into logical groups that have a common characteristic. Ordering enables you to order the results within a partition. Framing enables you to create a sliding window frame within a partition that moves relative to the current row. You can configure the size of the moving window frame based on a number of rows or a range of values, such as a time interval.
#legacySQLSELECT<window_function>OVER([PARTITIONBY<expr>][ORDERBY<expr>[ASC|DESC]][<window-frame-clause>])
PARTITION BY- Defines the base partition over which this function operates. Specify one or more comma-separated column names; one partition will be created for each distinct set of values for these columns, similar to a
GROUP BYclause. IfPARTITION BYis omitted, the base partition is all rows in the input to the window function. - The
PARTITION BYclause also allows window functions to partition data and parallelize execution. If you wish to use a window function withallowLargeResults, or if you intend to apply further joins or aggregations to the output of your window function, usePARTITION BYto parallelize execution. JOIN EACHandGROUP EACH BYclauses can't be used on the output of window functions. To generatelarge query results when using window functions, you must usePARTITION BY.ORDER BY- Sorts the partition. If
ORDER BYis absent, there is no guarantee of any default sorting order. Sorting occurs at the partition level, before any window frame clause is applied. If you specify aRANGEwindow, you should add anORDER BYclause. Default order isASC. ORDER BYis optional in some cases, but certain window functions, such asrank() ordense_rank(), require the clause.- If you use
ORDER BYwithout specifyingROWSorRANGE,ORDER BYimplies that the window extends from the beginning of the partition to the current row. In the absence of anORDER BYclause, the window is the entire partition. <window-frame-clause>{ROWS | RANGE} {BETWEEN <start> AND <end> | <start> | <end>}- A subset of the partition over which to operate. This can be the same size as the partition or smaller. If you use
ORDER BYwithout awindow-frame-clause, the default window frame isRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. If you omit bothORDER BYand thewindow-frame-clause, the default window frame is the entire partition.ROWS- Defines a window in terms of row position, relative to the current row. For example, to add a column showing the sum of the preceding 5 rows of salary values, you would querySUM(salary) OVER (ROWS BETWEEN 5 PRECEDING AND CURRENT ROW). The set of rows typically includes the current row, but that is not required.RANGE- Defines a window in terms of a range of values in a given column, relative to that column's value in the current row. Only operates on numbers and dates, where date values are simple integers (microseconds since the epoch). Neighboring rows with the same value are calledpeer rows. Peer rows of theCURRENT ROWare included in a window frame that specifiesCURRENT ROW. For example, if you specify the window end to beCURRENT ROWand the following row in the window has the same value, it will be included in the function calculation.BETWEEN <start> AND <end>- A range, inclusive of the start and end rows. The range need not include the current row, but<start>must precede or equal<end>.<start>- Specifies the start offset for this window, relative to the current row. The following options are supported: where{UNBOUNDED PRECEDING | CURRENT ROW | <expr> PRECEDING | <expr> FOLLOWING}<expr>is a positive integer,PRECEDINGindicates a preceding row number or range value, andFOLLOWINGindicates a following row number or range value.UNBOUNDED PRECEDINGmeans the first row of the partition. If the start precedes the window, it will be set to the first row of the partition.<end>- Specifies the end offset for this window, relative to the current row. The following options are supported: where{UNBOUNDED FOLLOWING | CURRENT ROW | <expr> PRECEDING | <expr> FOLLOWING}<expr>is a positive integer,PRECEDINGindicates a preceding row number or range value, andFOLLOWINGindicates a following row number or range value.UNBOUNDED FOLLOWINGmeans the last row of the partition. If end is beyond the end of the window, it will be set to the last row of the partition.
Unlike aggregation functions, which collapse many input rows into one output row, window functions return one row of output for each row of input. This feature makes it easier to create queries that calculate running totals and moving averages. For example, the following query returns a running total for a small dataset of five rows defined bySELECT statements:
#legacySQLSELECTname,value,SUM(value)OVER(ORDERBYvalue)ASRunningTotalFROM(SELECT"a"ASname,0ASvalue),(SELECT"b"ASname,1ASvalue),(SELECT"c"ASname,2ASvalue),(SELECT"d"ASname,3ASvalue),(SELECT"e"ASname,4ASvalue);
Return value:
+------+-------+--------------+| name | value | RunningTotal |+------+-------+--------------+| a | 0 | 0 || b | 1 | 1 || c | 2 | 3 || d | 3 | 6 || e | 4 | 10 |+------+-------+--------------+
The following example calculates a moving average of the values in thecurrent row and the row preceding it. The window frame comprises two rowsthat move with the current row.
#legacySQLSELECTname,value,AVG(value)OVER(ORDERBYvalueROWSBETWEEN1PRECEDINGANDCURRENTROW)ASMovingAverageFROM(SELECT"a"ASname,0ASvalue),(SELECT"b"ASname,1ASvalue),(SELECT"c"ASname,2ASvalue),(SELECT"d"ASname,3ASvalue),(SELECT"e"ASname,4ASvalue);
Return value:
+------+-------+---------------+| name | value | MovingAverage |+------+-------+---------------+| a | 0 | 0.0 || b | 1 | 0.5 || c | 2 | 1.5 || d | 3 | 2.5 || e | 4 | 3.5 |+------+-------+---------------+
Syntax
| Window functions | |
|---|---|
AVG()COUNT(*)COUNT([DISTINCT])MAX()MIN()STDDEV()SUM() | The same operation as the correspondingAggregate functions, but are computed over a window defined by the OVER clause. |
CUME_DIST() | Returns a double that indicates the cumulative distribution of a value in a group of values ... |
DENSE_RANK() | Returns the integer rank of a value in a group of values. |
FIRST_VALUE() | Returns the first value of the specified field in the window. |
LAG() | Enables you to read data from a previous row within a window. |
LAST_VALUE() | Returns the last value of the specified field in the window. |
LEAD() | Enables you to read data from a following row within a window. |
NTH_VALUE() | Returns the value of<expr> at position<n> of the window frame ... |
NTILE() | Divides the window into the specified number of buckets. |
PERCENT_RANK() | Returns the rank of the current row, relative to the other rows in the partition. |
PERCENTILE_CONT() | Returns an interpolated value that would map to the percentile argument with respect to the window ... |
PERCENTILE_DISC() | Returns the value nearest the percentile of the argument over the window. |
RANK() | Returns the integer rank of a value in a group of values. |
RATIO_TO_REPORT() | Returns the ratio of each value to the sum of the values. |
ROW_NUMBER() | Returns the current row number of the query result over the window. |
AVG(numeric_expr)COUNT(*)COUNT([DISTINCT]field)MAX(field)MIN(field)STDDEV(numeric_expr)SUM(field)- These window functions perform the same operation as the correspondingAggregate functions, but are computed over a window defined by the OVER clause.
Another significant difference is that the
COUNT([DISTINCT]field)function produces exact results when used as a window function, with behavior similar to theEXACT_COUNT_DISTINCT()aggregate function.In the example query, the
ORDER BYclause causes the window to be computed from the start of the partition to the current row, which generates a cumulative sum for that year.#legacySQLSELECTcorpus_date,corpus,word_count,SUM(word_count)OVER(PARTITIONBYcorpus_dateORDERBYword_count)annual_totalFROM[bigquery-public-data:samples.shakespeare]WHEREword='love'ORDERBYcorpus_date,word_count
Returns:
corpus_date corpus word_count annual_total 0 various 37 37 0 sonnets 157 194 1590 2kinghenryvi 18 18 1590 1kinghenryvi 24 42 1590 3kinghenryvi 40 82 CUME_DIST()Returns a double that indicates the cumulative distribution of a value in a group of values, calculated using the formula
<number of rows preceding or tied with the current row> /<total rows>. Tied values return the same cumulative distribution value.This window function requires
ORDER BYin theOVERclause.#legacySQLSELECTword,word_count,CUME_DIST()OVER(PARTITIONBYcorpusORDERBYword_countDESC)cume_dist,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
Returns:
word word_count cume_dist handkerchief 29 0.2 satisfaction 5 0.4 displeasure 4 0.8 instruments 4 0.8 circumstance 3 1.0 DENSE_RANK()Returns the integer rank of a value in a group of values. The rank is calculated based on comparisons with other values in the group.
Tied values display as the same rank. The rank of the next value is incremented by 1. For example, if two values tie for rank 2, the next ranked value is 3. If you prefer a gap in the ranking list, userank().
This window function requires
ORDER BYin theOVERclause. Returns:#legacySQLSELECTword,word_count,DENSE_RANK()OVER(PARTITIONBYcorpusORDERBYword_countDESC)dense_rank,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count dense_rank handkerchief 29 1 satisfaction 5 2 displeasure 4 3 instruments 4 3 circumstance 3 4 FIRST_VALUE(<field_name>)Returns the first value of
<field_name>in the window. Returns:#legacySQLSELECTword,word_count,FIRST_VALUE(word)OVER(PARTITIONBYcorpusORDERBYword_countDESC)fv,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT1
word word_count fv imperfectly 1 imperfectly LAG(<expr>[,<offset>[,<default_value>]])Enables you to read data from a previous row within a window. Specifically,
LAG()returns the value of<expr>for the row located<offset>rows before the current row. If the row doesn't exist,<default_value>returns.#legacySQLSELECTword,word_count,LAG(word,1)OVER(PARTITIONBYcorpusORDERBYword_countDESC)lag,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
Returns:
word word_count lag handkerchief 29 null satisfaction 5 handkerchief displeasure 4 satisfaction instruments 4 displeasure circumstance 3 instruments LAST_VALUE(<field_name>)Returns the last value of
<field_name>in the window.#legacySQLSELECTword,word_count,LAST_VALUE(word)OVER(PARTITIONBYcorpusORDERBYword_countDESC)lv,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT1
Returns:
word word_count lv imperfectly 1 imperfectly LEAD(<expr>[,<offset>[,<default_value>]])Enables you to read data from a following row within a window. Specifically,
LEAD()returns the value of<expr>for the row located<offset>rows after the current row. If the row doesn't exist,<default_value>returns. Returns:#legacySQLSELECTword,word_count,LEAD(word,1)OVER(PARTITIONBYcorpusORDERBYword_countDESC)lead,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count lead handkerchief 29 satisfaction satisfaction 5 displeasure displeasure 4 instruments instruments 4 circumstance circumstance 3 null NTH_VALUE(<expr>,<n>)Returns the value of
<expr>at position<n>of the window frame, where<n>is a one-based index.NTILE(<num_buckets>)Divides a sequence of rows into
<num_buckets>buckets and assigns a corresponding bucket number, as an integer, with each row. Thentile()function assigns the bucket numbers as equally as possible and returns a value from 1 to<num_buckets>for each row. Returns:#legacySQLSELECTword,word_count,NTILE(2)OVER(PARTITIONBYcorpusORDERBYword_countDESC)ntile,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count ntile handkerchief 29 1 satisfaction 5 1 displeasure 4 1 instruments 4 2 circumstance 3 2 PERCENT_RANK()Returns the rank of the current row, relative to the other rows in the partition. Returned values range between 0 and 1, inclusively. The first value returned is 0.0.
This window function requires
ORDER BYin theOVERclause. Returns:#legacySQLSELECTword,word_count,PERCENT_RANK()OVER(PARTITIONBYcorpusORDERBYword_countDESC)p_rank,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count p_rank handkerchief 29 0.0 satisfaction 5 0.25 displeasure 4 0.5 instruments 4 0.5 circumstance 3 1.0 PERCENTILE_CONT(<percentile>)Returns an interpolated value that would map to the percentile argument with respect to the window, after ordering them per the
ORDER BYclause.<percentile>must be between 0 and 1.This window function requires
ORDER BYin theOVERclause. Returns:#legacySQLSELECTword,word_count,PERCENTILE_CONT(0.5)OVER(PARTITIONBYcorpusORDERBYword_countDESC)p_cont,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count p_cont handkerchief 29 4 satisfaction 5 4 displeasure 4 4 instruments 4 4 circumstance 3 4 PERCENTILE_DISC(<percentile>)Returns the value nearest the percentile of the argument over the window.
<percentile>must be between 0 and 1.This window function requires
ORDER BYin theOVERclause. Returns:#legacySQLSELECTword,word_count,PERCENTILE_DISC(0.5)OVER(PARTITIONBYcorpusORDERBYword_countDESC)p_disc,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count p_disc handkerchief 29 4 satisfaction 5 4 displeasure 4 4 instruments 4 4 circumstance 3 4 RANK()Returns the integer rank of a value in a group of values. The rank is calculated based on comparisons with other values in the group.
Tied values display as the same rank. The rank of the next value is incremented according to how many tied values occurred before it. For example, if two values tie for rank 2, the next ranked value is 4, not 3. If you prefer no gaps in the ranking list, usedense_rank().
This window function requires
ORDER BYin theOVERclause. Returns:#legacySQLSELECTword,word_count,RANK()OVER(PARTITIONBYcorpusORDERBYword_countDESC)rank,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count rank handkerchief 29 1 satisfaction 5 2 displeasure 4 3 instruments 4 3 circumstance 3 5 RATIO_TO_REPORT(<column>)Returns the ratio of each value to the sum of the values, as a double between 0 and 1.
Returns:#legacySQLSELECTword,word_count,RATIO_TO_REPORT(word_count)OVER(PARTITIONBYcorpusORDERBYword_countDESC)r_to_r,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count r_to_r handkerchief 29 0.6444444444444445 satisfaction 5 0.1111111111111111 displeasure 4 0.08888888888888889 instruments 4 0.08888888888888889 circumstance 3 0.06666666666666667 ROW_NUMBER()Returns the current row number of the query result over the window, starting with 1.
Returns:#legacySQLSELECTword,word_count,ROW_NUMBER()OVER(PARTITIONBYcorpusORDERBYword_countDESC)row_num,FROM[bigquery-public-data:samples.shakespeare]WHEREcorpus='othello'andlength(word) >10LIMIT5
word word_count row_num handkerchief 29 1 satisfaction 5 2 displeasure 4 3 instruments 4 4 circumstance 3 5
Other functions
Syntax
| Other functions | |
|---|---|
CASE WHEN ... THEN | Use CASE to choose among two or more alternate expressions in your query. |
CURRENT_USER() | Returns the email address of the user running the query. |
EVERY() | Returns true if the argument is true for all of its inputs. |
FROM_BASE64() | Converts the base-64 encoded input string into BYTES format. |
HASH() | Computes and returns a 64-bit signed hash value ... |
FARM_FINGERPRINT() | Computes and returns a 64-bit signed fingerprint value ... |
IF() | If first argument is true, returns second argument; otherwise returns third argument. |
POSITION() | Returns the one-based, sequential position of the argument. |
SHA1() | Returns aSHA1 hash, in BYTES format. |
SOME() | Returns true if argument is true for at least one of its inputs. |
TO_BASE64() | Converts the BYTES argument to a base-64 encoded string. |
CASE WHEN when_expr1 THEN then_expr1
WHEN when_expr2 THEN then_expr2 ...
ELSE else_expr END- Use CASE to choose among two or more alternate expressions in your query. WHEN expressions must be boolean, and all the expressions in THEN clauses and ELSE clause must be compatible types.
CURRENT_USER()- Returns the email address of the user running the query.
EVERY(<condition>)- Returns
trueifconditionis true for all of its inputs. When used with theOMIT IFclause, this function is useful for queries that involve repeated fields. FROM_BASE64(<str>)- Converts the base64-encoded input string
strintoBYTES format. To convert BYTES to a base64-encoded string, useTO_BASE64(). HASH(expr)- Computes and returns a 64-bit signed hash value of the bytes of
expras defined by theCityHash library (version 1.0.3). Any string or integer expression is supported and the function respectsIGNORE CASEfor strings, returning case invariant values. FARM_FINGERPRINT(expr)- Computes and returns a 64-bit signed fingerprint value of the
STRINGorBYTESinput using theFingerprint64function from theopen-source FarmHash library. The output of this function for a particular input will never change and matches the output of theFARM_FINGERPRINTfunction when usingGoogleSQL. RespectsIGNORE CASEfor strings, returning case invariant values. IF(condition,true_return,false_return)- Returns either
true_returnorfalse_return, depending on whetherconditionis true or false. The return values can be literals or field-derived values, but they must be the same data type. Field-derived values do not need to be included in theSELECTclause. POSITION(field)- Returns the one-based, sequential position offield within a set of repeated fields.
SHA1(<str>)- Returns aSHA1 hash, in BYTES format, of the input string
str. You can convert the result to base64 using TO_BASE64(). For example:#legacySQLSELECTTO_BASE64(SHA1(corpus))FROM[bigquery-public-data:samples.shakespeare]LIMIT100;
SOME(<condition>)- Returns
trueifconditionis true for at least one of its inputs. When used with theOMIT IFclause, this function is useful for queries that involve repeated fields. TO_BASE64(<bin_data>)- Converts theBYTES input
bin_datato a base64-encoded string. For example: To convert a base64-encoded string to BYTES, useFROM_BASE64().#legacySQLSELECTTO_BASE64(SHA1(title))FROM[bigquery-public-data:samples.wikipedia]LIMIT100;
Advanced examples
Bucketing results into categories using conditionals
The following query uses a
CASE/WHENblock to bucket results into "region" categories based on a list of states. If the state does not appear as an option in one of theWHENstatements, the state value will default to "None."Example:
#legacySQLSELECTCASEWHENstateIN('WA','OR','CA','AK','HI','ID','MT','WY','NV','UT','CO','AZ','NM')THEN'West'WHENstateIN('OK','TX','AR','LA','TN','MS','AL','KY','GA','FL','SC','NC','VA','WV','MD','DC','DE')THEN'South'WHENstateIN('ND','SD','NE','KS','MN','IA','MO','WI','IL','IN','MI','OH')THEN'Midwest'WHENstateIN('NY','PA','NJ','CT','RI','MA','VT','NH','ME')THEN'Northeast'ELSE'None'ENDasregion,average_mother_age,average_father_age,state,yearFROM(SELECTyear,state,SUM(mother_age)/COUNT(mother_age)asaverage_mother_age,SUM(father_age)/COUNT(father_age)asaverage_father_ageFROM[bigquery-public-data:samples.natality]WHEREfather_age <99GROUPBYyear,state)ORDERBYyearLIMIT5;
Returns:
+--------+--------------------+--------------------+-------+------+| region | average_mother_age | average_father_age | state | year |+--------+--------------------+--------------------+-------+------+| South | 24.342600163532296 | 27.683769419460344 | AR | 1969 || West | 25.185041908446163 | 28.268214055448098 | AK | 1969 || West | 24.780776677578217 | 27.831181063905248 | CA | 1969 || West | 25.005834769924412 | 27.942978384829598 | AZ | 1969 || South | 24.541730952905738 | 27.686430093306885 | AL | 1969 |+--------+--------------------+--------------------+-------+------+
Simulating a Pivot Table
Use conditional statements to organize the results of a subselect query into rows and columns. In the example below, results from a search for most revised Wikipedia articles that start with the value 'Google' are organized into columns where the revision counts are displayed if they meet various criteria.
Example:
#legacySQLSELECTpage_title,/* Populate these columns as True or False, *//* depending on the condition */IF(page_titleCONTAINS'search',INTEGER(total),0)ASsearch,IF(page_titleCONTAINS'Earth'ORpage_titleCONTAINS'Maps',INTEGER(total),0)ASgeo,FROM/* Subselect to return top revised Wikipedia articles *//* containing 'Google', followed by additional text. */(SELECTTOP(title,5)aspage_title,COUNT(*)astotalFROM[bigquery-public-data:samples.wikipedia]WHEREREGEXP_MATCH(title,r'^Google.+')ANDwp_namespace=0);
Returns:
+---------------+--------+------+| page_title | search | geo |+---------------+--------+------+| Google search | 4261 | 0 || Google Earth | 0 | 3874 || Google Chrome | 0 | 0 || Google Maps | 0 | 2617 || Google bomb | 0 | 0 |+---------------+--------+------+
Using HASH to select a random sample of your data
Some queries can provide a useful result using random subsampling of the result set. To retrieve a random sampling of values, use the
HASHfunction to return results in which the modulo "n" of the hash equals zero.For example, the following query will find the
HASH()of the "title" value, and then checks if that value modulo "2" is zero. This should result in about 50% of the values being labeled as "sampled." To sample fewer values, increase the value of the modulo operation from "2" to something larger. The query uses theABSfunction in combination withHASH, becauseHASHcan return negative values, and themodulo operator on a negative value yields a negative value.Example:
#legacySQLSELECTtitle,HASH(title)AShash_value,IF(ABS(HASH(title))%2==1,'True','False')ASincluded_in_sampleFROM[bigquery-public-data:samples.wikipedia]WHEREwp_namespace=0LIMIT5;
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.