User-defined functions in Python
Preview
This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Note: For support during the preview, emailbq-python-udf-feedback@google.com.A Python user-defined function (UDF) lets you implement a scalar function inPython and use it in a SQL query. Python UDFs are similar toSQL and Javascript UDFs,but with additional capabilities. Python UDFs let you install third-partylibraries fromthe Python Package Index (PyPI) and let youaccess external services using aCloud resource connection.
Python UDFs are built and run on BigQuery managed resources.
Limitations
python-3.11is the only supported runtime.- You cannot create a temporary Python UDF.
- You cannot use a Python UDF with a materialized view.
- The results of a query that calls a Python UDF are not cached because thereturn value of a Python UDF is always assumed to be non-deterministic.
- Python UDFs are not fully supported in
INFORMATION_SCHEMAviews. - You cannot create or update a Python UDF using theRoutine API.
- VPC service controls are notsupported.
- Customer-managed encryption keys (CMEK) are not supported.
- These data types are not supported:
JSON,RANGE,INTERVAL, andGEOGRAPHY. - Containers that run Python UDFs can be configured up to2 vCpu and 8 Gi only.
Required IAM roles
The required IAM roles are based on whether you are a Python UDFowner or a Python UDF user. A Python UDF owner typically creates or updates aUDF. A Python UDF user invokes a UDF created by someone else.
Additional roles are also required if you create or run a Python UDF thatreferences a Cloud resource connection.
UDF owners
If you're creating or updating a Python UDF, the following predefinedIAM roles should be granted on the appropriate resource:
| Role | Required permissions | Resource |
|---|---|---|
BigQuery Data Editor (roles/bigquery.dataEditor) |
| The dataset where the Python UDF is created or updated. |
BigQuery Job User (roles/bigquery.jobUser) |
| The project where you're running theCREATE FUNCTION statement. |
BigQuery Connection Admin (roles/bigquery.connectionAdmin) |
| The connection you're giving access to an external resource. This connection is required only if your UDF uses theWITH CONNECTION clause to access an external service. |
UDF users
If you're invoking a Python UDF, the following predefined IAMroles should be granted on the appropriate resource:
| Role | Required permissions | Resource |
|---|---|---|
BigQuery User (roles/bigquery.user) | bigquery.jobs.create to run a query job that references the UDF. | The project where you're running a query job that invokes the Python UDF. |
BigQuery Data Viewer (roles/bigquery.dataViewer) | bigquery.routines.get to run a UDF created by someone else. | The dataset where the Python UDF is stored. |
BigQuery Connection User (roles/bigquery.connectionUser) | bigquery.connections.use to run a Python UDF that references a Cloud resource connection. | The Cloud resource connection referenced by the Python UDF. This connection is required only if your UDF references a connection. |
For more information about roles in BigQuery, seePredefinedIAM roles.
Create a persistent Python UDF
Follow these rules when you create a Python UDF:
The body of the Python UDF must be a quoted string literal that representsthe Python code. To learn more about quoted string literals, seeFormatsfor quoted literals.
The body of the Python UDF must include a Python function that is used inthe
entry_pointargument in the Python UDF options list.A Python runtime version needs to be specified in the
runtime_versionoption. The only supported Python runtime version ispython-3.11. For afull list of available options, see theFunction option listfor theCREATE FUNCTIONstatement.
To create a persistent Python UDF, use theCREATE FUNCTION statementwithout theTEMP orTEMPORARY keyword. To delete a persistent Python UDF,use theDROP FUNCTIONstatement.
When you create a Python UDF using theCREATE FUNCTION statement,BigQuery creates or updates a container image that is based on abase image. The container is built on the base image using your code and anyspecified package dependencies. Creating the container is a long-runningprocess. The first query after you run theCREATE FUNCTION statement mightautomatically wait for the image to complete. Without any external dependencies,the container image should typically be created in less than a minute.
Example
To see an example of creating a persistent Python UDF, choose on of thefollowing options:
Console
The following example creates a persistent Python UDF namedmultiplyInputs andcalls the UDF from within aSELECT statement:
Go to theBigQuery page.
In the query editor, enter the following
CREATE FUNCTIONstatement:CREATEFUNCTION`PROJECT_ID.DATASET_ID`.multiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64LANGUAGEpythonOPTIONS(runtime_version="python-3.11",entry_point="multiply")ASr'''def multiply(x, y): return x * y''';-- Call the Python UDF.WITHnumbersAS(SELECT1ASx,5asyUNIONALLSELECT2ASx,10asyUNIONALLSELECT3asx,15asy)SELECTx,y,`PROJECT_ID.DATASET_ID`.multiplyInputs(x,y)ASproductFROMnumbers;
ReplacePROJECT_ID.DATASET_IDwith your project ID and dataset ID.
Click Run.
This example produces the following output:
+-----+-----+--------------+| x | y | product |+-----+-----+--------------+| 1 | 5 | 5.0 || 2 | 10 | 20.0 || 3 | 15 | 45.0 |+-----+-----+--------------+
BigQuery DataFrames
The following example uses BigQuery DataFrames to turn a customfunction into a Python UDF:
importbigframes.pandasasbpd# Set BigQuery DataFrames optionsbpd.options.bigquery.project=your_gcp_project_idbpd.options.bigquery.location="US"# BigQuery DataFrames gives you the ability to turn your custom functions# into a BigQuery Python UDF. One can find more details about the usage and# the requirements via `help` command.help(bpd.udf)# Read a table and inspect the column of interest.df=bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")df["body_mass_g"].peek(10)# Define a custom function, and specify the intent to turn it into a# BigQuery Python UDF. Let's try a `pandas`-like use case in which we want# to apply a user defined function to every value in a `Series`, more# specifically bucketize the `body_mass_g` value of the penguins, which is a# real number, into a category, which is a string.@bpd.udf(dataset=your_bq_dataset_id,name=your_bq_routine_id,)defget_bucket(num:float)->str:ifnotnum:return"NA"boundary=4000return"at_or_above_4000"ifnum >=boundaryelse"below_4000"# Then we can apply the udf on the `Series` of interest via# `apply` API and store the result in a new column in the DataFrame.df=df.assign(body_mass_bucket=df["body_mass_g"].apply(get_bucket))# This will add a new column `body_mass_bucket` in the DataFrame. You can# preview the original value and the bucketized value side by side.df[["body_mass_g","body_mass_bucket"]].peek(10)# The above operation was possible by doing all the computation on the# cloud through an underlying BigQuery Python UDF that was created to# support the user's operations in the Python code.# The BigQuery Python UDF created to support the BigQuery DataFrames# udf can be located via a property `bigframes_bigquery_function`# set in the udf object.print(f"Created BQ Python UDF:{get_bucket.bigframes_bigquery_function}")# If you have already defined a custom function in BigQuery, either via the# BigQuery Google Cloud Console or with the `udf` decorator,# or otherwise, you may use it with BigQuery DataFrames with the# `read_gbq_function` method. More details are available via the `help`# command.help(bpd.read_gbq_function)existing_get_bucket_bq_udf=get_bucket.bigframes_bigquery_function# Here is an example of using `read_gbq_function` to load an existing# BigQuery Python UDF.df=bpd.read_gbq("bigquery-public-data.ml_datasets.penguins")get_bucket_function=bpd.read_gbq_function(existing_get_bucket_bq_udf)df=df.assign(body_mass_bucket=df["body_mass_g"].apply(get_bucket_function))df.peek(10)# Let's continue trying other potential use cases of udf. Let's say we# consider the `species`, `island` and `sex` of the penguins sensitive# information and want to redact that by replacing with their hash code# instead. Let's define another scalar custom function and decorate it# as a udf. The custom function in this example has external package# dependency, which can be specified via `packages` parameter.@bpd.udf(dataset=your_bq_dataset_id,name=your_bq_routine_id,packages=["cryptography"],)defget_hash(input:str)->str:fromcryptography.fernetimportFernet# handle missing valueifinputisNone:input=""key=Fernet.generate_key()f=Fernet(key)returnf.encrypt(input.encode()).decode()# We can use this udf in another `pandas`-like API `map` that# can be applied on a DataFramedf_redacted=df[["species","island","sex"]].map(get_hash)df_redacted.peek(10)# If the BigQuery routine is no longer needed, we can clean it up# to free up any cloud quotasession=bpd.get_global_session()session.bqclient.delete_routine(f"{your_bq_dataset_id}.{your_bq_routine_id}")Create a vectorized Python UDF
You can implement your Python UDF to process a batch of rows instead of a singlerow by using vectorization. Vectorization can improve query performance.
To control batching behavior, specify the maximum number of rows in each batchby using themax_batching_rows option in theCREATE OR REPLACE FUNCTIONoption list.If you specifymax_batching_rows, BigQuery determines thenumber of rows in a batch, up to themax_batching_rows limit. Ifmax_batching_rows is not specified, the number of rows to batch is determinedautomatically.
A vectorized Python UDF has a singlepandas.DataFrameargument that must be annotated. Thepandas.DataFrame argument has the samenumber of columns as the Python UDF parameters defined in theCREATE FUNCTIONstatement. The column names in thepandas.DataFrame argument have the samenames as the UDF's parameters.
Your function needs to return either apandas.Seriesor a single-columnpandas.DataFrame with the same number of rows as the input.
The following example creates a vectorized Python UDF namedmultiplyInputswith two parameters—x andy:
Go to theBigQuery page.
In the query editor, enter the following
CREATE FUNCTIONstatement:CREATEFUNCTION`PROJECT_ID.DATASET_ID`.multiplyVectorized(xFLOAT64,yFLOAT64)RETURNSFLOAT64LANGUAGEpythonOPTIONS(runtime_version="python-3.11",entry_point="vectorized_multiply")ASr'''import pandas as pddef vectorized_multiply(df: pd.DataFrame): return df['x'] * df['y']''';
ReplacePROJECT_ID.DATASET_IDwith your project ID and dataset ID.
Calling the UDF is the same as in the previous example.
Click Run.
Supported Python UDF data types
The following table defines the mapping between BigQuery datatypes, Python data types, and Pandas data types:
| BigQuery data type | Python built-in data type used by standard UDF | Pandas data type used by vectorized UDF | PyArrow data type used for ARRAY and STRUCT in vectorized UDF |
|---|---|---|---|
BOOL | bool | BooleanDtype | DataType(bool) |
INT64 | int | Int64Dtype | DataType(int64) |
FLOAT64 | float | FloatDtype | DataType(double) |
STRING | str | StringDtype | DataType(string) |
BYTES | bytes | binary[pyarrow] | DataType(binary) |
TIMESTAMP | Function parameter: Function return value: | Function parameter: Function return value: | TimestampType(timestamp[us]), with timezone |
DATE | datetime.date | date32[pyarrow] | DataType(date32[day]) |
TIME | datetime.time | time64[pyarrow] | Time64Type(time64[us]) |
DATETIME | datetime.datetime (without timezone) | timestamp[us][pyarrow] | TimestampType(timestamp[us]), without timezone |
ARRAY | list | list<...>[pyarrow], where the element data type is apandas.ArrowDtype | ListType |
STRUCT | dict | struct<...>[pyarrow], where the field data type is apandas.ArrowDtype | StructType |
Supported runtime versions
BigQuery Python UDFs support thepython-3.11 runtime. ThisPython version includes some additional pre-installed packages. Forsystem libraries, check the runtime base image.
| Runtime version | Python version | Includes | Runtime base image |
|---|---|---|---|
| python-3.11 | Python 3.11 | numpy 1.26.3 pyarrow 14.0.2 pandas 2.1.4 python-dateutil 2.8.2 | google-22-full/python311 |
Use third-party packages
You can use theCREATE FUNCTION option listto use modules other than those provided by thePython standard libraryand pre-installed packages. You can install packages from thePython PackageIndex (PyPI), or you can import Python files fromCloud Storage.
Install a package from the Python package index
When you install a package, you must provide the package name, and you canoptionally provide the package version usingPython package version specifiers.If the package is in the runtime, that package is used unless a particularversion is specified in theCREATE FUNCTION option list. If a package versionis not specified, and the package isn't in the runtime, the latest availableversion is used. Only packages withthe wheels binary formatare supported.
The following example shows you how to create a Python UDF that installs thescipy package using theCREATE OR REPLACE FUNCTION option list:
Go to theBigQuery page.
In the query editor, enter the following
CREATE FUNCTIONstatement:CREATEFUNCTION`PROJECT_ID.DATASET_ID`.area(radiusFLOAT64)RETURNSFLOAT64LANGUAGEpythonOPTIONS(entry_point='area_handler',runtime_version='python-3.11',packages=['scipy==1.15.3'])ASr"""import scipydef area_handler(radius): return scipy.constants.pi*radius*radius""";SELECT`PROJECT_ID.DATASET_ID`.area(4.5);
ReplacePROJECT_ID.DATASET_IDwith your project ID and dataset ID.
Click Run.
Import additional Python files as libraries
You can extend your Python UDFs using theFunction option listby importing Python files from Cloud Storage.
Note: The user that creates the UDF needs thestorage.objects.getpermission on the Cloud Storage bucket.In your UDF's Python code, you can import the Python files fromCloud Storage as modules by using the import statement followed by thepath to the Cloud Storage object. For example, if you are importinggs://BUCKET_NAME/path/to/lib1.py, then your import statement would beimport path.to.lib1.
The Python filename needs to be a Python identifier. Eachfolder name inthe object name (after the/) should be a valid Python identifier. Within theASCII range (U+0001..U+007F), the following characters can be used inidentifiers:
- Uppercase and lowercase letters A through Z.
- Underscores.
- The digits zero through nine, but a number cannot appear as the firstcharacter in the identifier.
The following example shows you how to create a Python UDF that imports thelib1.py client library package from a Cloud Storage bucket namedmy_bucket:
Go to theBigQuery page.
In the query editor, enter the following
CREATE FUNCTIONstatement:CREATEFUNCTION`PROJECT_ID.DATASET_ID`.myFunc(aFLOAT64,bSTRING)RETURNSSTRINGLANGUAGEpythonOPTIONS(entry_point='compute',runtime_version='python-3.11',library=['gs://my_bucket/path/to/lib1.py'])ASr"""import path.to.lib1 as lib1def compute(a, b): # doInterestingStuff is a function defined in # gs://my_bucket/path/to/lib1.py return lib1.doInterestingStuff(a, b);""";
ReplacePROJECT_ID.DATASET_IDwith your project ID and dataset ID.
Click Run.
Configure container limits for Python UDFs
You can use theCREATE FUNCTION option listto specify CPU and memory limits for containers that run Python UDFs.
By default, the memory allocated to each container instance is 512 MiB,and the CPU allocated is 0.33 vCPU.
The following example creates a Python UDFusing theCREATE FUNCTIONoption list to specify container limits:
Go to theBigQuery page.
In the query editor, enter the following
CREATE FUNCTIONstatement:CREATEFUNCTION`PROJECT_ID.DATASET_ID`.resizeImage(imageBYTES)RETURNSBYTESLANGUAGEpythonOPTIONS(entry_point='resize_image',runtime_version='python-3.11',packages=['Pillow==11.2.1'],container_memory='2Gi',container_cpu=1)ASr"""import iofrom PIL import Imagedef resize_image(image_bytes): img = Image.open(io.BytesIO(image_bytes)) resized_img = img.resize((256, 256), Image.Resampling.LANCZOS) output_stream = io.BytesIO() resized_img.convert('RGB').save(output_stream, format='JPEG') return output_stream.getvalue()""";
ReplacePROJECT_ID.DATASET_IDwith your project ID and dataset ID.
Click Run.
Supported CPU values
Python UDFs support fractional CPU values between0.33 and1.0 andnon-fractional CPU values of1,2. Fractional inputvalues are rounded to two decimal places before they are applied to thecontainer.
Supported Memory Values
Python UDF containers support memory values in the following format:<integer_number><unit>. The unit must be one of these values:Mi,M,Gi,G. The minimum amount of memory you can configure is 256 Mebibyte (256 Mi).The maximum amount of memory you can configure is 8 Gibibyte (8 Gi).
Based on the memory value you choose, you must also specify the minimum amountof CPU. The following table shows the minimum CPU values for each memory value:
| Memory | Minimum CPU |
|---|---|
512 MiB or less | 0.33 |
More than 512 MiB | 0.5 |
More than 1 GiB | 1 |
More than 4 GiB | 2 |
Call Google Cloud or online services in Python code
A Python UDF accesses a Google Cloud service or an external service by using theCloud resource connectionservice account. The connection's service account must be granted permissions toaccess the service. The permissions required vary depending on the service thatis accessed and the APIs that are called from your Python code.
If you create a Python UDF without using a Cloud resource connection, thefunction is executed in an environment that blocks network access. If your UDFaccesses online services, you must create the UDF with a Cloud resourceconnection. If you don't, the UDF is blocked from accessing the network until aninternal connection timeout is reached.
The following example shows you how to access the Cloud Translation servicefrom a Python UDF. This example has two projects—a project namedmy_query_project where you create the UDF and the Cloud resource connection,and a project where you are running the Cloud Translation namedmy_translate_project.
Create a Cloud resource connection
First, you create a Cloud resource connection inmy_query_project. To createthe cloud resource connection, follow the steps on theCreate a Cloud resourceconnectionpage.
After you create the connection, open it, and in theConnection info pane,copy the service account ID. You need this ID when you configure permissions forthe connection. When you create a connection resource, BigQuerycreates a unique system service account and associates it with the connection.
Grant access to the connection's service account
To grant the Cloud resource connection service account access to your projects,grant the service account theService usage consumer role(roles/serviceusage.serviceUsageConsumer) inmy_query_project and theCloud Translation API user role(roles/cloudtranslate.user) inmy_translate_project.
Go to theIAM page.
Verify that
my_query_projectis selected.ClickGrant Access.
In theNew principals field, enter the Cloud resource connection'sservice account ID that you copied previously.
In theSelect a role field, chooseService usage, and then selectService usage consumer.
ClickSave.
In the project selector, choose
my_translate_project.Go to theIAM page.
ClickGrant Access.
In theNew principals field, enter the Cloud resource connection'sservice account ID that you copied previously.
In theSelect a role field, chooseCloud translation, and thenselectCloud Translation API user.
ClickSave.
Create a Python UDF that calls the Cloud Translation service
Inmy_query_project, create a Python UDF that calls the Cloud Translationservice using your Cloud resource connection.
Go to theBigQuery page.
Enter the following
CREATE FUNCTIONstatement in the queryeditor:CREATEFUNCTION`PROJECT_ID.DATASET_ID`.translate_to_es(xSTRING)RETURNSSTRINGLANGUAGEpythonWITHCONNECTION`PROJECT_ID.REGION.CONNECTION_ID`OPTIONS(entry_point='do_translate',runtime_version='python-3.11',packages=['google-cloud-translate>=3.11','google-api-core'])ASr"""from google.api_core.retry import Retryfrom google.cloud import translateproject = "my_translate_project"translate_client = translate.TranslationServiceClient()def do_translate(x : str) -> str: response = translate_client.translate_text( request={ "parent": f"projects/{project}/locations/us-central1", "contents": [x], "target_language_code": "es", "mime_type": "text/plain", }, retry=Retry(), ) return response.translations[0].translated_text""";-- Call the UDF.WITHtext_tableAS(SELECT"Hello"AStextUNIONALLSELECT"Good morning"AStextUNIONALLSELECT"Goodbye"AStext)SELECTtext,`PROJECT_ID.DATASET_ID`.translate_to_es(text)AStranslated_textFROMtext_table;
Replace the following:
PROJECT_ID.DATASET_ID:your project ID and dataset IDREGION.CONNECTION_ID:your connection's region and connection ID
Click Run.
The output should look like the following:
+--------------------------+-------------------------------+| text | translated_text |+--------------------------+-------------------------------+| Hello | Hola || Good morning | Buen dia || Goodbye | Adios |+--------------------------+-------------------------------+
Supported locations
Python UDFs are supported in all BigQuerymulti-region and regional locations.
Pricing
Python UDFs are offered without any additional charges.
When billing is enabled, the following apply:
- Python UDF charges are billed using theBigQuery Services SKU.
- The charges are proportional to the amount of compute and memoryconsumed when the Python UDF is invoked.
- Python UDF customers are also charged for the cost of building orrebuilding the UDF container image. This charge is proportional to theresources used to build the image with customer code and dependencies.
- If Python UDFs result in external or internet network egress, you alsosee aPremium Tier internet egress charge fromCloud Networking.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.