User-defined functions in legacy SQL
This document details how to use JavaScript user-defined functions in legacy SQL query syntax. The preferred query syntax for BigQuery is GoogleSQL. For information on user-defined functions in GoogleSQL, seeGoogleSQL user-defined functions.
BigQuery legacy SQL supports user-defined functions (UDFs) written in JavaScript. A UDF is similar to the "Map" function in a MapReduce: it takes a single row as input and produces zero or more rows as output. The output can potentially have a different schema than the input.
For information on user-defined functions in GoogleSQL, seeUser-defined functions in GoogleSQL.
UDF example
//UDFdefinitionfunctionurlDecode(row,emit){emit({title:decodeHelper(row.title),requests:row.num_requests});}//HelperfunctionwitherrorhandlingfunctiondecodeHelper(s){try{returndecodeURI(s);}catch(ex){returns;}}//UDFregistrationbigquery.defineFunction('urlDecode',//NameusedtocallthefunctionfromSQL['title','num_requests'],//Inputcolumnnames//JSONrepresentationoftheoutputschema[{name:'title',type:'string'},{name:'requests',type:'integer'}],urlDecode//Thefunctionreference);
UDF structure
functionname(row,emit){emit(<outputdata>);}
BigQuery UDFs operate on individual rows of a table or subselect query results. The UDF has two formal parameters:
row: an input row.emit: a hook used by BigQuery to collect output data. Theemitfunction takes one parameter: a JavaScript object that represents a single row of output data. Theemitfunction can be called more than once, such as in a loop, to output multiple rows of data.
The following code example shows a basic UDF.
functionurlDecode(row,emit){emit({title:decodeURI(row.title),requests:row.num_requests});}
Registering the UDF
You must register a name for your function so that it can be invoked from BigQuery SQL. The registered name doesn't have to match the name you used for your function in JavaScript.
bigquery.defineFunction('<UDF name>',//NameusedtocallthefunctionfromSQL['<col1>','<col2>'],//Inputcolumnnames//JSONrepresentationoftheoutputschema[<outputschema>],//UDFdefinitionorreference<UDFdefinitionorreference>);
Input columns
The input column names must match the names (or aliases, if applicable) of the columns in the input table or subquery.
For input columns that are records, you must specify—in the input column list—the leaf fields that you want to access from the record.
For example, if you have a record that stores a person's name and age:
person RECORD REPEATED name STRING OPTIONAL age INTEGER OPTIONAL
The input specifier for the name and age would be:
['person.name', 'person.age']
Use of['person'] without the name or age would generate an error.
The resulting output will match the schema; you'll have an array of JavaScript objects, where each object has a "name" and an "age" property. For example:
[ {name: 'alice', age: 23}, {name: 'bob', age: 64}, ... ]Output schema
You must provide BigQuery with the schema or structure of the records your UDF produces, represented as JSON. The schema can containany supported BigQuery data types, including nested records. The supported type specifiers are:
- boolean
- float
- integer
- record
- string
- timestamp
The following code example shows the syntax for records in the output schema. Each output field requires aname andtype attribute. Nested fields must also contain afields attribute.
[{name:'foo_bar',type:'record',fields:[{name:'a',type:'string'},{name:'b',type:'integer'},{name:'c',type:'boolean'}]}]
Each field can contain an optionalmode attribute, which supports the following values:
- nullable : this is the default and may be omitted.
- required : if specified, the given field must be set to a value and cannot be undefined.
- repeated : if specified, the given field must be an array.
Rows passed to theemit() function must match the data types of the output schema. Fields represented in the output schema that are omitted in the emit function will output as nulls.
UDF definition or reference
If you prefer, you can define the UDF inline inbigquery.defineFunction. For example:
bigquery.defineFunction('urlDecode',//NameusedtocallthefunctionfromSQL['title','num_requests'],//Inputcolumnnames//JSONrepresentationoftheoutputschema[{name:'title',type:'string'},{name:'requests',type:'integer'}],//TheUDFfunction(row,emit){emit({title:decodeURI(row.title),requests:row.num_requests});});
Otherwise, you can define the UDF separately, and pass a reference to the function inbigquery.defineFunction. For example:
//TheUDFfunctionurlDecode(row,emit){emit({title:decodeURI(row.title),requests:row.num_requests});}//UDFregistrationbigquery.defineFunction('urlDecode',//NameusedtocallthefunctionfromSQL['title','num_requests'],//Inputcolumnnames//JSONrepresentationoftheoutputschema[{name:'title',type:'string'},{name:'requests',type:'integer'}],urlDecode//Thefunctionreference);
Error handling
If an exception or error is thrown during the processing of a UDF, the entire query will fail. You can use a try-catch block to handle errors. For example:
//TheUDFfunctionurlDecode(row,emit){emit({title:decodeHelper(row.title),requests:row.num_requests});}//HelperfunctionwitherrorhandlingfunctiondecodeHelper(s){try{returndecodeURI(s);}catch(ex){returns;}}//UDFregistrationbigquery.defineFunction('urlDecode',//NameusedtocallthefunctionfromSQL['title','num_requests'],//Inputcolumnnames//JSONrepresentationoftheoutputschema[{name:'title',type:'string'},{name:'requests',type:'integer'}],urlDecode//Thefunctionreference);
Running a query with a UDF
You can use UDFs in legacy SQL with thebq command-line tool or theBigQuery API. TheGoogle Cloud console doesn't support UDFs in legacy SQL.
Using the bq command-line tool
To run a query containing one or more UDFs, specify the--udf_resource flag in the bq command-line tool from the Google Cloud CLI. The value of the flag can be either a Cloud Storage (gs://...) URI or the path to a local file. To specify multiple UDF resource files, repeat this flag.
Use the following syntax to run a query with a UDF:
bq query --udf_resource=<file_path_or_URI> <sql_query>
The following example runs a query that uses a UDF stored in a local file and a SQL querythat is also stored in a local file.
Creating the UDF
You can store the UDF in Cloud Storage or as a local text file. For example, to store the followingurlDecode UDF, create a file namedurldecode.js and paste the following JavaScript code into the file before saving the file.
// UDF definitionfunction urlDecode(row, emit) { emit({title: decodeHelper(row.title), requests: row.num_requests});}// Helper function with error handlingfunction decodeHelper(s) { try { return decodeURI(s); } catch (ex) { return s; }}// UDF registrationbigquery.defineFunction( 'urlDecode', // Name used to call the function from SQL ['title', 'num_requests'], // Input column names // JSON representation of the output schema [{name: 'title', type: 'string'}, {name: 'requests', type: 'integer'}], urlDecode // The function reference);Creating the query
You can also store the query in a file to keep your command line from becoming too verbose. For example, you can create a local file namedquery.sql and paste the following BigQuery statement into the file.
#legacySQLSELECTrequests,titleFROMurlDecode(SELECTtitle,sum(requests)ASnum_requestsFROM[my-project:wikipedia.pagecounts_201504]WHERElanguage='fr'GROUPEACHBYtitle)WHEREtitleLIKE'%ç%'ORDERBYrequestsDESCLIMIT100
After saving the file you can reference the file on the command line.
Running the query
After defining the UDF and the query in separate files, you can reference them in the command line. For example, the following command runs the query that you saved as the file namedquery.sql and references the UDF that you created.
$ bq query --udf_resource=urldecode.js "$(cat query.sql)"
Using the BigQuery API
configuration.query
Queries that use UDFs must containuserDefinedFunctionResources elements that provide the code, or locations to code resources, to be used in the query. The supplied code must include registration function invocations for any UDFs referenced by the query.
Code resources
Your query configuration may include JavaScript code blobs, as well as references to JavaScript source files stored in Cloud Storage.
Inline JavaScript code blobs are populated in theinlineCode section of auserDefinedFunctionResource element. However, code that will be reused or referenced across multiple queries should be persisted in Cloud Storage and referenced as an external resource.
To reference a JavaScript source file from Cloud Storage, set theresourceURI section of theuserDefinedFunctionResource element to the file'sgs:// URI.
The query configuration may contain multipleuserDefinedFunctionResource elements. Each element may contain either aninlineCode or aresourceUri section.
Example
The following JSON example illustrates a query request that references two UDF resources: one blob of inline code, and one filelib.js to be read from Cloud Storage. In this example,myFunc and the registration invocation formyFunc are provided bylib.js.
{"configuration":{"query":{"userDefinedFunctionResources":[{"inlineCode":"var someCode = 'here';"},{"resourceUri":"gs://some-bucket/js/lib.js"}],"query":"select a from myFunc(T);"}}}
Best practices
Developing your UDF
You can useour UDF test tool to test and debug your UDF without running up your BigQuery bill.
Pre-filter your input
If your input can be easily filtered down before being passed to a UDF, your query will likely be faster and cheaper.
In therunning a query example, a subquery is passed as the input tourlDecode, instead of a full table. A table might have billions of rows, and if we ran the UDF on the entire table, the JavaScript framework would need to process many more rows than it would with the filtered subquery.
Avoid persistent mutable state
Do not store or access mutable state across UDF calls. The following code example describes this scenario:
//myCode.jsvarnumRows=0;functiondontDoThis(r,emit){emit({rowCount:++numRows});}//Thequery.SELECTmax(rowCount)FROMdontDoThis(t);
The above example will not behave as expected, because BigQuery shards your query across many nodes. Each node has a standalone JavaScript processing environment that accumulates separate values fornumRows.
Use memory efficiently
The JavaScript processing environment has limited memory available per query. UDF queries that accumulate too much local state may fail due to memory exhaustion.
Expand select queries
You must explicitly list the columns being selected from a UDF.SELECT * FROM<UDF name>(...) isn't supported.
To examine the structure of the input row data, you can useJSON.stringify() to emit a string output column:
bigquery.defineFunction('examineInputFormat',['some','input','columns'],[{name:'input',type:'string'}],function(r,emit){emit({input:JSON.stringify(r)});});
Limits
- The amount of data that your UDF outputs when processing a single row should be approximately 5 MB or less.
- Each user is limited to running approximately 6 UDF queries in a specific project at the same time. If you receive an error that you're over theconcurrent query limit, wait a few minutes and try again.
- A UDF can timeout and prevent your query from completing. Timeouts can be as short as 5 minutes, but can vary depending on several factors, including how much user CPU time your function consumes and how large your inputs and outputs to the JS function are.
- A query job can have a maximum of 50 UDF resources (inline code blobs or external files).
- Each inline code blob is limited to a maximum size of 32 KB. To use larger code resources, store your code in Cloud Storage and reference it as an external resource.
- Each external code resource is limited to a maximum size of 1 MB.
- The cumulative size of all external code resources is limited to a maximum of 5 MB.
Limitations
- The DOM objects
Window,DocumentandNode, and functions that require them, are unsupported. - JavaScript functions that rely on native code are unsupported.
- Bitwise operations in JavaScript handle only the most significant 32 bits.
- Because of their non-deterministic nature, queries that invoke user-defined functions cannot use cached results.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.