Data definition language (DDL) statements in GoogleSQL

Data definition language (DDL) statements let you create and modifyBigQuery resources usingGoogleSQLquery syntax. You can use DDL commands to create, alter, and delete resources,such as the following:

Required permissions

To create a job that runs a DDL statement, you must have thebigquery.jobs.create permission for the project where you are running the job.Each DDL statement also requires specific permissions on the affected resources,which are documented under each statement.

IAM roles

The predefined IAM rolesbigquery.user,bigquery.jobUser, andbigquery.admin include the requiredbigquery.jobs.create permission.

For more information about IAM roles in BigQuery,seePredefined roles and permissions or theIAM permissions reference.

Run DDL statements

You can run DDL statements by using the Google Cloud console, by using thebq command-line tool, by calling thejobs.query REST API, orprogrammatically using theBigQuery API client libraries.

Console

  1. Go to the BigQuery page in the Google Cloud console.

    Go to BigQuery

  2. ClickCompose new query.

  3. Enter the DDL statement into theQuery editor text area. For example:

    CREATETABLEmydataset.newtable(xINT64)

  4. ClickRun.

bq

Enter thebq query commandand supply the DDL statement as the query parameter. Set theuse_legacy_sql flag tofalse.

bqquery--use_legacy_sql=false\'CREATE TABLE mydataset.newtable ( x INT64 )'

API

Call thejobs.query methodand supply the DDL statement in the request body'squery property.

DDL functionality extends the information returned by aJobs resource.statistics.query.statementType includes the following additional values for DDL support:

  • CREATE_TABLE
  • CREATE_TABLE_AS_SELECT
  • DROP_TABLE
  • CREATE_VIEW
  • DROP_VIEW

statistics.query has 2 additional fields:

  • ddlOperationPerformed: The DDL operation performed, possibly dependent on the existence of the DDL target. Current values include:
    • CREATE: The query created the DDL target.
    • SKIP: No-op. Examples —CREATE TABLE IF NOT EXISTS was submitted, and the table exists. OrDROP TABLE IF EXISTS was submitted, and the table does not exist.
    • REPLACE: The query replaced the DDL target. Example —CREATE OR REPLACE TABLE was submitted, and the table already exists.
    • DROP: The query deleted the DDL target.
  • ddlTargetTable: When you submit aCREATE TABLE/VIEW statement or aDROP TABLE/VIEW statement, the target table is returned as an object with 3 fields:
    • "projectId": string
    • "datasetId": string
    • "tableId": string

Java

Call theBigQuery.create()method to start a query job. Call theJob.waitFor()method to wait for the DDL query to finish.

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.QueryJobConfiguration;// Sample to create a view using DDLpublicclassDDLCreateView{publicstaticvoidrunDDLCreateView(){// TODO(developer): Replace these variables before running the sample.StringprojectId="MY_PROJECT_ID";StringdatasetId="MY_DATASET_ID";StringtableId="MY_VIEW_ID";Stringddl="CREATE VIEW "+"`"+projectId+"."+datasetId+"."+tableId+"`"+" OPTIONS("+" expiration_timestamp=TIMESTAMP_ADD("+" CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),"+" friendly_name=\"new_view\","+" description=\"a view that expires in 2 days\","+" labels=[(\"org_unit\", \"development\")]"+" )"+" AS SELECT name, state, year, number"+" FROM `bigquery-public-data.usa_names.usa_1910_current`"+" WHERE state LIKE 'W%'`";ddlCreateView(ddl);}publicstaticvoidddlCreateView(Stringddl){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();QueryJobConfigurationconfig=QueryJobConfiguration.newBuilder(ddl).build();// create a view using query and it will wait to complete job.Jobjob=bigquery.create(JobInfo.of(config));job=job.waitFor();if(job.isDone()){System.out.println("View created successfully");}else{System.out.println("View was not created");}}catch(BigQueryException|InterruptedExceptione){System.out.println("View was not created. \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client library and create a clientconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctionddlCreateView(){// Creates a view via a DDL query/**   * TODO(developer): Uncomment the following lines before running the sample.   */// const projectId = "my_project"// const datasetId = "my_dataset"// const tableId = "my_new_view"constquery=`  CREATE VIEW \`${projectId}.${datasetId}.${tableId}\`  OPTIONS(      expiration_timestamp=TIMESTAMP_ADD(          CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),      friendly_name="new_view",      description="a view that expires in 2 days",      labels=[("org_unit", "development")]  )  AS SELECT name, state, year, number      FROM \`bigquery-public-data.usa_names.usa_1910_current\`      WHERE state LIKE 'W%'`;// For all options, see https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/queryconstoptions={query:query,};// Run the query as a jobconst[job]=awaitbigquery.createQueryJob(options);job.on('complete',metadata=>{console.log(`Created new view${tableId} via job${metadata.id}`);});}

Python

Call theClient.query()method to start a query job. Call theQueryJob.result()method to wait for the DDL query to finish.

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

# from google.cloud import bigquery# project = 'my-project'# dataset_id = 'my_dataset'# table_id = 'new_view'# client = bigquery.Client(project=project)sql="""CREATE VIEW `{}.{}.{}`OPTIONS(    expiration_timestamp=TIMESTAMP_ADD(        CURRENT_TIMESTAMP(), INTERVAL 48 HOUR),    friendly_name="new_view",    description="a view that expires in 2 days",    labels=[("org_unit", "development")])AS SELECT name, state, year, number    FROM `bigquery-public-data.usa_names.usa_1910_current`    WHERE state LIKE 'W%'""".format(project,dataset_id,table_id)job=client.query(sql)# API request.job.result()# Waits for the query to finish.print('Created new view "{}.{}.{}".'.format(job.destination.project,job.destination.dataset_id,job.destination.table_id,))

On-demand query size calculation

If you use on-demand billing, BigQuery charges for data definitionlanguage (DDL) queries based on the number of bytes processed by the query.

DDL statementBytes processed
CREATE TABLENone.
CREATE TABLE ... AS SELECT ...The sum of bytes processed for all the columns referenced from the tables scanned by the query.
CREATE VIEWNone.
DROP TABLENone.
DROP VIEWNone.

For more information about cost estimation, seeEstimate and control costs.

CREATE SCHEMA statement

Creates a new dataset.

Key Point: This SQL statement uses the termSCHEMA to refer to a logicalcollection of tables, views, and other resources. The equivalent concept inBigQuery is adataset. In this context,SCHEMA does not referto BigQuerytable schemas.

Syntax

CREATESCHEMA[IFNOTEXISTS][project_name.]dataset_name[DEFAULTCOLLATEcollate_specification][OPTIONS(schema_option_list)]

Arguments

  • IF NOT EXISTS: If any dataset exists with the same name, theCREATEstatement has no effect.

  • DEFAULT COLLATE collate_specification: When a new table is created in thedataset, the table inherits adefaultcollation specificationunless a collation specification is explicitly specified for a table or acolumn.

    If you remove or change this collation specification later with theALTER SCHEMA statement, this will not change existingcollation specifications in this dataset. If you want to update an existingcollation specification in a dataset, you must alter the column that containsthe specification.

  • project_name: The name of the project where you are creating the dataset.Defaults to the project that runs this DDL statement.

  • dataset_name: The name of the dataset to create.

  • schema_option_list: A list of options for creatingthe dataset.

Details

The dataset is created in the location that you specify in the query settings.For more information, seeSpecifying your location.

For more information about creating a dataset, seeCreating datasets. For information about quotas, seeDataset limits.

schema_option_list

The option list specifies options for the dataset. Specify the options in thefollowing format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
default_kms_key_nameSTRINGSpecifies the default Cloud KMS key for encrypting table data in this dataset. You can override this value when you create a table.
default_partition_expiration_daysFLOAT64Specifies the default expiration time, in days, for table partitions in this dataset. You can override this value when you create a table.
default_rounding_mode

STRING

Example:default_rounding_mode = "ROUND_HALF_EVEN"

This specifies thedefaultRoundingMode that is used for new tables created in this dataset. It does not impact existing tables. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.
default_table_expiration_daysFLOAT64Specifies the default expiration time, in days, for tables in this dataset. You can override this value when you create a table.
descriptionSTRINGThe description of the dataset.
failover_reservationSTRINGAssociates the dataset to a reservation in the case of a failover scenario.
friendly_nameSTRINGA descriptive name for the dataset.
is_case_insensitiveBOOLTRUE if the dataset and its table names are case-insensitive, otherwiseFALSE. By default, this isFALSE, which means the dataset and its table names are case-sensitive.
  • Datasets:mydataset andMyDataset can coexist in the same project, unless one of them has case-sensitivity turned off.
  • Tables:mytable andMyTable can coexist in the same dataset if case-sensitivity for the dataset is turned on.
is_primaryBOOLEANDeclares if the dataset is the primary replica.
labels<ARRAY<STRUCT<STRING, STRING>>>An array of labels for the dataset, expressed as key-value pairs.
locationSTRINGThe location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails.
max_time_travel_hoursSMALLINT Specifies the duration in hours of thetime travel window for the dataset. Themax_time_travel_hours value must be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168) between 48 (2 days) and 168 (7 days). 168 hours is the default if this option isn't specified.
primary_replicaSTRINGThe replica name to set as theprimary replica.
storage_billing_modelSTRING

Alters thestorage billing model for the dataset. Set thestorage_billing_model value toPHYSICAL to use physical bytes when calculating storage charges, or toLOGICAL to use logical bytes.LOGICAL is the default.

Thestorage_billing_model option is only available for datasets that have been updated after December 1, 2022. For datasets that were last updated before that date, the storage billing model isLOGICAL.

When you change a dataset's billing model, it takes 24 hours for the change to take effect.

Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.

tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags for the dataset, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.datasets.createThe project where you create the dataset.

Examples

Creating a new dataset

The following example creates a dataset with a default table expiration and aset of labels.

CREATESCHEMAmydatasetOPTIONS(location="us",default_table_expiration_days=3.75,labels=[("label1","value1"),("label2","value2")])

Creating a case-insensitive dataset

The following example creates a case-insensitive dataset. Both the dataset nameand table names inside the dataset are case-insensitive.

CREATESCHEMAmydatasetOPTIONS(is_case_insensitive=TRUE)

Creating a dataset with collation support

The following example creates a dataset with a collation specification.

CREATESCHEMAmydatasetDEFAULTCOLLATE'und:ci'

CREATE TABLE statement

Creates a new table.

Syntax

CREATE[ORREPLACE][TEMP|TEMPORARY]TABLE[IFNOTEXISTS]table_name[(column|constraint_definition[,...])][DEFAULTCOLLATEcollate_specification][PARTITIONBYpartition_expression][CLUSTERBYclustering_column_list][WITHCONNECTIONconnection_name][OPTIONS(table_option_list)][ASquery_statement]column:=column_definitionconstraint_definition:=[primary_key]|[[CONSTRAINTconstraint_name]foreign_key,...]primary_key:=PRIMARYKEY(column_name[,...])NOTENFORCEDforeign_key:=FOREIGNKEY(column_name[,...])foreign_referenceforeign_reference:=REFERENCESprimary_key_table(column_name[,...])NOTENFORCED

Arguments

  • OR REPLACE: Replaces any table with the same name if it exists. Cannotappear withIF NOT EXISTS.

  • TEMP | TEMPORARY: Creates atemporary table.

  • IF NOT EXISTS: If any table exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.

  • table_name: The name of the table to create. SeeTable path syntax. For temporary tables, do not includethe project name or dataset name.

  • column: The table's schema information.

  • constraint_definition: An expression that defines a table constraint.

  • collation_specification:When a new column is added to the table without an explicitcollation specification,thecolumn inherits thiscollation specification forSTRING types.

    If you remove or change this collation specification later with theALTER TABLE statement, this will not change existingcollation specifications in this table. If you want to update an existingcollation specification in a table, you must alter the column that containsthe specification.

    If the table is part of a dataset, the default collation specification forthis table overrides the default collation specification for the dataset.

  • partition_expression: An expression that determineshow to partition the table.

  • clustering_column_list: A comma-separated listof column references that determine how to cluster the table. You cannot havecollation on columns in this list.

  • connection_name: Specifies aconnection resource that hascredentials for accessing the external data. Specify the connection namein the formPROJECT_ID.LOCATION.CONNECTION_ID. If theproject ID or location contains a dash, enclose the connection name inbackticks (`). To use adefault connection,specifyDEFAULT instead of the connection string containingPROJECT_ID.LOCATION.CONNECTION_ID.

  • table_option_list: A list of options for creating thetable.

  • query_statement: The query from which the table should be created. For thequery syntax, seeSQL syntax reference.If a collation specification is used on this table, collation passesthrough this query statement.

  • primary_key:An expression that defines a primary keytable constraint.BigQuery only supports unenforced primary keys.

  • foreign_key:An expression that defines a foreign keytable constraint.BigQuery only supports unenforced foreign keys.

Details

CREATE TABLE statements must comply with the following rules:

  • Only oneCREATE statement is allowed.
  • Either the column list, theAS query_statement clause, or both must bepresent.
  • When both the column list and theAS query_statement clause are present,BigQuery ignores the names in theAS query_statementclause and matches the columns with the column list by position.
  • When theAS query_statement clause is present and the column list isabsent, BigQuery determines the column names and typesfrom theAS query_statement clause.
  • Column names must be specified either through the column list,theAS query_statement clause or schema of the table in theLIKE clause.
  • Duplicate column names are not allowed.
  • When both theLIKE and theAS query_statement clause are present, thecolumn list in the query statement must match the columns of the tablereferenced by theLIKE clause.
  • Table names are case-sensitive unless the dataset they belong to is not.To create a case-insensitive dataset, seeCreating a case-insensitive dataset.To alter a dataset to make it case-insensitive dataset, seeTurning on case insensitivity for a dataset.

Limitations:

  • It is not possible to create aningestion-time partitioned tablefrom the result of a query. Instead, use aCREATE TABLE DDL statement tocreate the table, and then use anINSERT DML statementto insert data into it.
  • It is not possible to use theOR REPLACE modifier to replace a table witha different kind of partitioning. Instead,DROP the table, and then use aCREATE TABLE ... AS SELECT ... statement to recreate it.

This statement supports the following variants, which have the same limitations:

column

(column_name column_schema[, ...]) contains the table'sschema information in a comma-separated list.

Note: Constraints cannot be specified onARRAY orSTRUCT elements.
column:=column_namecolumn_schemacolumn_schema:={simple_type|STRUCT<field_list>|ARRAY<array_element_schema>}[PRIMARYKEYNOTENFORCED|REFERENCEStable_name(column_name)NOTENFORCED][DEFAULTdefault_expression|GENERATEDALWAYSAS(generation_expression)STOREDOPTIONS(generation_option_list)][NOTNULL][OPTIONS(column_option_list)]simple_type:={data_type|STRINGCOLLATEcollate_specification}field_list:=field_namecolumn_schema[,...]array_element_schema:={simple_type|STRUCT<field_list>}[NOTNULL]
  • column_name is the name of the column.A column name:

    • Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
    • Must start with a letter or underscore
    • Can be up to 300 characters
  • column_schema: Similar to adata type, but supports anoptionalNOT NULL constraint for types other thanARRAY.column_schemaalso supports options on top-level columns andSTRUCT fields.

    column_schema can be used only in the column definition list ofCREATE TABLE statements. It cannot be used as a type in expressions.

  • simple_type: Anysupported data typeaside fromSTRUCT andARRAY.

    Ifsimple_type is aSTRING, it supports an additional clause forcollation,which defines how a resultingSTRING can be compared and sorted.The syntax looks like this:

    STRINGCOLLATEcollate_specification

    If you haveDEFAULT COLLATE collate_specification assigned to the table,the collation specification for a column overrides the specification for thetable.

  • default_expression: Thedefault valueassigned to the column. You cannot specifyDEFAULT ifGENERATED ALWAYS ASis specified.

  • generation_expression: (Preview)An expression for an automatically generated embedding column.Setting this field enablesautonomous embedding generationon the table. The onlysupportedgeneration_expression syntax is a call to theAI.EMBED function.

    • You can't specifyGENERATED ALWAYS AS ifDEFAULT is specified.
    • Theconnection_id argument toAI.EMBED is required when used in ageneration expression.
    • The type of the column must beSTRUCT<result ARRAY<FLOAT64>, status STRING>.
  • generation_option_list: The options for a generated column.The only supported option isasynchronous = TRUE.

  • field_list: Represents the fields in a struct.

  • field_name: The name of the struct field. Struct field names have thesame restrictions as column names.

  • NOT NULL: When theNOT NULL constraint is present for a column or field,the column or field is created withREQUIRED mode. Conversely, when theNOT NULL constraint is absent, the column or field is created withNULLABLE mode.

    Columns and fields ofARRAY type do not support theNOT NULL modifier. Forexample, acolumn_schema ofARRAY<INT64> NOT NULL is invalid, sinceARRAY columns haveREPEATED mode and can be empty but cannot beNULL.An array element in a table can never beNULL, regardless of whether theNOT NULL constraint is specified. For example,ARRAY<INT64> is equivalenttoARRAY<INT64 NOT NULL>.

    TheNOT NULL attribute of a table'scolumn_schema does not propagatethrough queries over the table. If tableT contains a column declared asx INT64 NOT NULL, for example,CREATE TABLE dataset.newtable AS SELECT x FROM T creates a table nameddataset.newtable in whichx isNULLABLE.

partition_expression

PARTITION BY is an optional clause that controlstable andvector index partitioning.partition_expression is an expression that determines how to partition thetable or vector index. The partition expression can contain the followingvalues:

  • _PARTITIONDATE. Partition by ingestion time with daily partitions. Thissyntax cannot be used with theAS query_statement clause.
  • DATE(_PARTITIONTIME). Equivalent to_PARTITIONDATE. This syntax cannot beused with theAS query_statement clause.
  • <date_column>. Partition by aDATE column with daily partitions.
  • DATE({ <timestamp_column> | <datetime_column> }). Partition by aTIMESTAMPorDATETIME column with daily partitions.
  • DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR }). Partitionby aDATETIME column with the specified partitioning type.
  • TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }).Partition by aTIMESTAMP column with the specified partitioning type.
  • TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR }). Partitionby ingestion time with the specified partitioning type. This syntax cannot beused with theAS query_statement clause.
  • DATE_TRUNC(<date_column>, { MONTH | YEAR }). Partition by aDATE columnwith the specified partitioning type.
  • RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>])).Partition by an integer column with the specified range, where:

    • start is the start of range partitioning, inclusive.
    • end is the end of range partitioning, exclusive.
    • interval is the width of each range within the partition. Defaults to 1.

clustering_column_list

CLUSTER BY is an optional clause that controlstable clustering.clustering_column_list is a comma-separated list that determines how tocluster the table. The clustering column list can contain a list of up to fourclustering columns.

Note: You cannot have collation on a column inclustering_column_list.

table_option_list

The option list lets you set table options such as alabel and an expiration time. You can include multipleoptions using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
expiration_timestampTIMESTAMP

Example:expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to theexpirationTime table resource property.

partition_expiration_days

FLOAT64

Example:partition_expiration_days=7

Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions don't expire.

This property is equivalent to thetimePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours.

This property can only be set if the table is partitioned.

require_partition_filter

BOOL

Example:require_partition_filter=true

Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value isfalse.

This property is equivalent to thetimePartitioning.requirePartitionFilter table resource property.

This property can only be set if the table is partitioned.

kms_key_name

STRING

Example:kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to theencryptionConfiguration.kmsKeyName table resource property.

See more details aboutProtecting data with Cloud KMS keys.

friendly_name

STRING

Example:friendly_name="my_table"

This property is equivalent to thefriendlyName table resource property.

description

STRING

Example:description="a table that expires in 2025"

This property is equivalent to thedescription table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example:labels=[("org_unit", "development")]

This property is equivalent to thelabels table resource property.

default_rounding_mode

STRING

Example:default_rounding_mode = "ROUND_HALF_EVEN"

This specifies the defaultrounding mode that's used for values written to any newNUMERIC orBIGNUMERIC type columns orSTRUCT fields in the table. It does not impact existing fields in the table. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.5 is rounded to 3.0, and -2.5 is rounded to -3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.5 is rounded to 2.0 and -2.5 is rounded to -2.0.

This property is equivalent to thedefaultRoundingMode table resource property.

enable_change_history

BOOL

Inpreview.

Example:enable_change_history=TRUE

Set this property toTRUE in order to capturechange history on the table, which you can then view by using theCHANGES function. Enabling this table option has an impact on costs; for more information seePricing and costs. The default isFALSE.

max_staleness

INTERVAL

Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND

The maximum interval behind the current time where it's acceptable to read stale data. For example, withchange data capture, when this option is set, the table copy operation is denied if data is more stale than themax_staleness value.

max_staleness is disabled by default.

enable_fine_grained_mutations

BOOL

Inpreview.

Example:enable_fine_grained_mutations=TRUE

Set this property toTRUE to enable fine-grained DML optimization on the table. The default isFALSE.

storage_uri

STRING

Inpreview.

Example:storage_uri=gs://BUCKET_DIRECTORY/TABLE_DIRECTORY/

A fully qualified location prefix for the external folder where data is stored. Supportsgs: buckets.

Required formanaged tables.

file_format

STRING

Inpreview.

Example:file_format=PARQUET

The open-source file format in which the table data is stored. OnlyPARQUET is supported.

Required formanaged tables.

The default isPARQUET.

table_format

STRING

Inpreview.

Example:table_format=ICEBERG

The open table format in which metadata-only snapshots are stored. OnlyICEBERG is supported.

Required formanaged tables.

The default isICEBERG.

tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags for the table, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

  • A reference to a table
  • Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

IfVALUE evaluates toNULL, the corresponding optionNAME in theCREATE TABLE statement is ignored.

column_option_list

Specify a column option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
description

STRING

Example:description="a unique id"

This property is equivalent to theschema.fields[].description table resource property.

rounding_mode

STRING

Example:rounding_mode = "ROUND_HALF_EVEN"

This specifies therounding mode that's used for values written to aNUMERIC orBIGNUMERIC type column orSTRUCT field. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.

This property is equivalent to theroundingMode table resource property.

data_policiesARRAY<STRING>

Applies adata policy to a column in a table (Preview).

Example:data_policies = ["{'name':'myproject.region-us.data_policy_name1'}", "{'name':'myproject.region-us.data_policy_name2'}"]

TheALTER TABLE ALTER COLUMN statement supports the= and+= operators to add data policies to a specific column.

Example:data_policies +=["data_policy1", "data_policy2"]

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

  • A reference to a table
  • Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Setting theVALUE replaces the existing value of that option for the column, ifthere was one. Setting theVALUE toNULL clears the column's value for thatoption.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the table.

In addition, theOR REPLACE clause requiresbigquery.tables.update andbigquery.tables.updateData permissions.

If theOPTIONS clause includes any expiration options, then thebigquery.tables.delete permission is also required.

Examples

Creating a new table

The following example creates a partitioned table namednewtable inmydataset:

CREATETABLEmydataset.newtable(xINT64OPTIONS(description="An optional INTEGER field"),ySTRUCT<aARRAY<STRING>OPTIONS(description="A repeated STRING field"),bBOOL>)PARTITIONBY_PARTITIONDATEOPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",partition_expiration_days=1,description="a table that expires in 2025, with each partition living for 24 hours",labels=[("org_unit","development")])

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.

If the table name exists in the dataset, the following error is returned:

Already Exists:project_id:dataset.table

The table uses the followingpartition_expression to partition the table:PARTITION BY _PARTITIONDATE. This expression partitions the table usingthe date in the_PARTITIONDATE pseudocolumn.

The table schema contains two columns:

The table option list specifies the:

  • Table expiration time: January 1, 2025 at 00:00:00 UTC
  • Partition expiration time: 1 day
  • Description:A table that expires in 2025
  • Label:org_unit = development

Creating a new table from an existing table

The following example creates a table namedtop_words inmydataset from aquery:

CREATETABLEmydataset.top_wordsOPTIONS(description="Top ten words per Shakespeare corpus")ASSELECTcorpus,ARRAY_AGG(STRUCT(word,word_count)ORDERBYword_countDESCLIMIT10)AStop_wordsFROM`bigquery-public-data`.samples.shakespeareGROUPBYcorpus;

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.top_words, your table qualifier might be`myproject.mydataset.top_words`.

If the table name exists in the dataset, the following error is returned:

Already Exists:project_id:dataset.table

The table schema contains 2 columns:

The table option list specifies the:

  • Description:Top ten words per Shakespeare corpus

Creating a table only if the table doesn't exist

The following example creates a table namednewtable inmydataset only if notable namednewtable exists inmydataset. If the table name exists in thedataset, no error is returned, and no action is taken.

CREATETABLEIFNOTEXISTSmydataset.newtable(xINT64,ySTRUCT<aARRAY<STRING>,bBOOL>)OPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",description="a table that expires in 2025",labels=[("org_unit","development")])

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.

The table schema contains 2 columns:

The table option list specifies the:

  • Expiration time: January 1, 2025 at 00:00:00 UTC
  • Description:A table that expires in 2025
  • Label:org_unit = development

Creating or replacing a table

The following example creates a table namednewtable inmydataset, and ifnewtable exists inmydataset, it is overwritten with an empty table.

CREATEORREPLACETABLEmydataset.newtable(xINT64,ySTRUCT<aARRAY<STRING>,bBOOL>)OPTIONS(expiration_timestamp=TIMESTAMP"2025-01-01 00:00:00 UTC",description="a table that expires in 2025",labels=[("org_unit","development")])

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.

The table schema contains 2 columns:

The table option list specifies the:

  • Expiration time: January 1, 2025 at 00:00:00 UTC
  • Description:A table that expires in 2025
  • Label:org_unit = development

Creating a table withREQUIRED columns

The following example creates a table namednewtable inmydataset. TheNOTNULL modifier in the column definition list of aCREATE TABLE statementspecifies that a column or field is created inREQUIRED mode.

CREATETABLEmydataset.newtable(xINT64NOTNULL,ySTRUCT<aARRAY<STRING>,bBOOLNOTNULL,cFLOAT64  >NOTNULL,zSTRING)

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.

If the table name exists in the dataset, the following error is returned:

Already Exists:project_id:dataset.table

The table schema contains 3 columns:

Creating a table with collation support

The following examples create a table namednewtable inmydataset withcolumnsa,b,c, and a struct with fieldsx andy.

AllSTRING column schemas in this table are collated with'und:ci':

CREATETABLEmydataset.newtable(aSTRING,bSTRING,cSTRUCT<xFLOAT64yARRAY<STRING>>)DEFAULTCOLLATE'und:ci';

Onlyb andy are collated with'und:ci':

CREATETABLEmydataset.newtable(aSTRING,bSTRINGCOLLATE'und:ci',cSTRUCT<xFLOAT64yARRAY<STRINGCOLLATE'und:ci'>>);

Creating a table with parameterized data types

The following example creates a table namednewtable inmydataset. Theparameters in parentheses specify that the column contains a parameterized datatype. SeeParameterized Data Typesfor more information about parameterized types.

CREATETABLEmydataset.newtable(xSTRING(10),ySTRUCT<aARRAY<BYTES(5)>,bNUMERIC(15,2)OPTIONS(rounding_mode='ROUND_HALF_EVEN'),cFLOAT64>,zBIGNUMERIC(35))

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. Instead ofmydataset.newtable, your table qualifier should be`myproject.mydataset.newtable`.

If the table name exists in the dataset, the following error is returned:

Already Exists:project_id:dataset.table

The table schema contains 3 columns:

  • x: A parameterized string with a maximum length of 10
  • y: A STRUCT containing a (an array of parameterized bytes with a maximum length of 5), b (aparameterized NUMERIC with a maximum precision of 15, maximum scale of 2,and rounding mode set to 'ROUND_HALF_EVEN'), and c (a float)
  • z: A parameterized BIGNUMERIC with a maximum precision of 35 and maximum scale of 0

Creating a partitioned table

The following example creates apartitioned tablenamednewtable inmydataset using aDATE column:

CREATETABLEmydataset.newtable(transaction_idINT64,transaction_dateDATE)PARTITIONBYtransaction_dateOPTIONS(partition_expiration_days=3,description="a table partitioned by transaction_date")

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.newtable, your table qualifier might be`myproject.mydataset.newtable`.

The table schema contains 2 columns:

  • transaction_id: An integer
  • transaction_date: A date

The table option list specifies the:

  • Partition expiration: Three days
  • Description:A table partitioned by transaction_date

Creating a partitioned table from the result of a query

The following example creates apartitioned tablenameddays_with_rain inmydataset using aDATE column:

CREATETABLEmydataset.days_with_rainPARTITIONBYdateOPTIONS(partition_expiration_days=365,description="weather stations with precipitation, partitioned by day")ASSELECTDATE(CAST(yearASINT64),CAST(moASINT64),CAST(daASINT64))ASdate,(SELECTANY_VALUE(name)FROM`bigquery-public-data.noaa_gsod.stations`ASstationsWHEREstations.usaf=stn)ASstation_name,-- Stations can have multiple namesprcpFROM`bigquery-public-data.noaa_gsod.gsod2017`ASweatherWHEREprcp!=99.9-- Filter unknown valuesANDprcp >0-- Filter stations/days with no precipitation

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_idcontains special characters:`project_id.dataset.table`. So, instead ofmydataset.days_with_rain, your table qualifier might be`myproject.mydataset.days_with_rain`.

The table schema contains 2 columns:

  • date: TheDATE of data collection
  • station_name: The name of the weather station as aSTRING
  • prcp: The amount of precipitation in inches as aFLOAT64

The table option list specifies the:

  • Partition expiration: One year
  • Description:Weather stations with precipitation, partitioned by day

Creating a clustered table

Example 1

The following example creates aclustered tablenamedmyclusteredtable inmydataset. The table is apartitioned table,partitioned by a truncatedTIMESTAMP column and clustered by aSTRING columnnamedcustomer_id.

CREATETABLEmydataset.myclusteredtable(input_timestampTIMESTAMP,customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYTIMESTAMP_TRUNC(input_timestamp,HOUR)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_id contains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.

The table schema contains 3 columns:

  • input_timestamp: The time of data collection as aTIMESTAMP
  • customer_id: The customer ID as aSTRING
  • transaction_amount: The transaction amount asNUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description:A table clustered by customer_id
Example 2

The following example creates aclustered tablenamedmyclusteredtable inmydataset. The table is aningestion-time partitioned table.

CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYDATE(_PARTITIONTIME)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_id contains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as aSTRING
  • transaction_amount: The transaction amount asNUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description:A table clustered by customer_id
Example 3

The following example creates aclustered tablenamedmyclusteredtable inmydataset. The table is not partitioned.

CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)CLUSTERBYcustomer_idOPTIONS(description="a table clustered by customer_id")

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_id contains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as aSTRING
  • transaction_amount: The transaction amount asNUMERIC

The table option list specifies the:

  • Description:A table clustered by customer_id

Creating a table with autonomous embedding generation

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

The following example creates a table namedembedded_table inmydataset withanautonomous embedding generationcolumnembedding that generates embeddings fromcontent column:

CREATETABLEmydataset.embedded_table(idINT64,contentSTRING,embeddingSTRUCT,statusSTRING>GENERATEDALWAYSAS(AI.EMBED(content,connection_id=>"US.embed_connection",endpoint=>"text-embedding-005"))STOREDOPTIONS(asynchronous=true));

Creating a clustered table from the result of a query

Example 1

The following example creates a partitioned andclustered tablenamedmyclusteredtable inmydataset using the result of a query.

CREATETABLEmydataset.myclusteredtable(input_timestampTIMESTAMP,customer_idSTRING,transaction_amountNUMERIC)PARTITIONBYDATE(input_timestamp)CLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3,description="a table clustered by customer_id")ASSELECT*FROMmydataset.myothertable

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_id contains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.

The table schema contains 3 columns:

  • input_timestamp: The time of data collection as aTIMESTAMP
  • customer_id: The customer ID as aSTRING
  • transaction_amount: The transaction amount asNUMERIC

The table option list specifies the:

  • Partition expiration: 3 days
  • Description:A table clustered by customer_id
Example 2

The following example creates aclustered tablenamedmyclusteredtable inmydataset using the result of a query. The tableis not partitioned.

CREATETABLEmydataset.myclusteredtable(customer_idSTRING,transaction_amountNUMERIC)CLUSTERBYcustomer_idOPTIONS(description="a table clustered by customer_id")ASSELECT*FROMmydataset.myothertable

If you haven't configured a default project, prepend a project ID to the datasetname in the example SQL, and enclose the name in backticks ifproject_id contains special characters:`project_id.dataset.table`. So, instead ofmydataset.myclusteredtable, your table qualifier might be`myproject.mydataset.myclusteredtable`.

The table schema contains 2 columns:

  • customer_id: The customer ID as aSTRING
  • transaction_amount: The transaction amount asNUMERIC

The table option list specifies the:

  • Description:A table clustered by customer_id

Creating a temporary table

The following example creates a temporary table namedExample and insertsvalues into it.

CREATETEMPTABLEExample(xINT64,ySTRING);INSERTINTOExampleVALUES(5,'foo');INSERTINTOExampleVALUES(6,'bar');SELECT*FROMExample;

This script returns the following output:

+-----+---+-----+|Row|x|y|+-----+---|-----+|1|5|foo||2|6|bar|+-----+---|-----+

Load data across clouds

Example 1

Suppose you have a BigLake table namedmyawsdataset.orders thatreferences data fromAmazon S3.You want to transfer data from that table to aBigQuery tablemyotherdataset.shipments in the US multi-region.

First, display information about themyawsdataset.orders table:

bqshowmyawsdataset.orders;

The output is similar to the following:

  Last modified             Schema              Type     Total URIs   Expiration----------------- -------------------------- ---------- ------------ -----------  31 Oct 17:40:28   |- l_orderkey: integer     EXTERNAL   1                    |- l_partkey: integer                    |- l_suppkey: integer                    |- l_linenumber: integer                    |- l_returnflag: string                    |- l_linestatus: string                    |- l_commitdate: date

Next, display information about themyotherdataset.shipments table:

bqshowmyotherdataset.shipments

The output is similar to the following. Some columns are omitted to simplify theoutput.

  Last modified             Schema             Total Rows   Total Bytes   Expiration   Time Partitioning   Clustered Fields   Total Logical ----------------- --------------------------- ------------ ------------- ------------ ------------------- ------------------ ---------------  31 Oct 17:34:31   |- l_orderkey: integer      3086653      210767042                                                         210767042                    |- l_partkey: integer                    |- l_suppkey: integer                    |- l_commitdate: date                    |- l_shipdate: date                    |- l_receiptdate: date                    |- l_shipinstruct: string                    |- l_shipmode: string

Now, using theCREATE TABLE AS SELECT statement you can selectively load datato themyotherdataset.orders table in the US multi-region:

CREATEORREPLACETABLEmyotherdataset.ordersPARTITIONBYDATE_TRUNC(l_commitdate,YEAR)ASSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1992;
Note: If you get aResourceExhausted error, retry after sometime. If the issue persists, you cancontact support.

You can then perform a join operation with the newly created table:

SELECTorders.l_orderkey,orders.l_orderkey,orders.l_suppkey,orders.l_commitdate,orders.l_returnflag,shipments.l_shipmode,shipments.l_shipinstructFROMmyotherdataset.shipmentsJOIN`myotherdataset.orders`asordersONorders.l_orderkey=shipments.l_orderkeyANDorders.l_partkey=shipments.l_partkeyANDorders.l_suppkey=shipments.l_suppkeyWHEREorders.l_returnflag='R';-- 'R' means refunded.

When new data is available, append the data of the 1993 year to the destinationtable using theINSERT INTO SELECT statement:

INSERTINTOmyotherdataset.ordersSELECT*FROMmyawsdataset.ordersWHEREEXTRACT(YEARFROMl_commitdate)=1993;

Example 2

The following example inserts data into an ingestion-time partitioned table:

CREATETABLEmydataset.orders(idString,numeric_idINT64)PARTITIONBY_PARTITIONDATE;

After creating a partitioned table, you can insert data into the ingestion-timepartitioned table:

INSERTINTOmydataset.orders(_PARTITIONTIME,id,numeric_id)SELECTTIMESTAMP("2023-01-01"),id,numeric_id,FROMmydataset.ordersof23WHEREnumeric_id>4000000;

CREATE TABLE LIKE statement

Creates a new table with all of the same metadata of another table.

Syntax

CREATE[ORREPLACE]TABLE[IFNOTEXISTS]table_nameLIKE[[project_name.]dataset_name.]source_table_name...[OPTIONS(table_option_list)]

Details

This statement is a variant of theCREATE TABLE statement and has the samelimitations.Other than the use of theLIKE clause in place of a column list,the syntax is identical to theCREATE TABLE syntax.

TheCREATE TABLE LIKE statement copies only the metadata of the source table.You can use theAS query_statement clause to include data into the new table.

The new table has no relationship to the source table after creation; thusmodifications to the source table will not propagate to the new table.

By default, the new table inherits partitioning, clustering, and optionsmetadata from the source table. You can customize metadata in the new table byusing the optional clauses in the SQL statement. For example, if you want tospecify a different set of options for the new table, then include theOPTIONSclause with a list of options and values. This behavior matches that ofALTER TABLE SET OPTIONS.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the table.
bigquery.tables.getThe source table.

In addition, theOR REPLACE clause requiresbigquery.tables.update andbigquery.tables.updateData permissions.

If theOPTIONS clause includes any expiration options, then thebigquery.tables.delete permission is also required.

Examples

Example 1

The following example creates a new table namednewtable inmydataset with the same metadata assourcetable:

CREATETABLEmydataset.newtableLIKEmydataset.sourcetable

Example 2

The following example creates a new table namednewtable inmydataset with the same metadata assourcetable and the data from theSELECT statement:

CREATETABLEmydataset.newtableLIKEmydataset.sourcetableASSELECT*FROMmydataset.myothertable

CREATE TABLE COPY statement

Creates a table that has the same metadata and data as another table.The source table can be a table, atable clone, or atable snapshot.

Syntax

CREATE[ORREPLACE]TABLE[IFNOTEXISTS]table_nameCOPYsource_table_name...[OPTIONS(table_option_list)]

Details

This statement is a variant of theCREATE TABLE statement and has the samelimitations.Other than the use of theCOPY clause in place of a column list,the syntax is identical to theCREATE TABLE syntax.

TheCREATE TABLE COPY statement copies both the metadata and data from thesource table.

The new table inherits partitioning and clustering from the source table. Bydefault, the table options metadata from the source table are also inherited,but you can override table options by using theOPTIONS clause. The behavioris equivalent to runningALTER TABLE SET OPTIONS after the table is copied.

The new table has no relationship to the source table after creation;modifications to the source table are not propagated to the new table.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the table copy.
bigquery.tables.getThe source table.
bigquery.tables.getDataThe source table.

In addition, theOR REPLACE clause requiresbigquery.tables.update andbigquery.tables.updateData permissions.

If theOPTIONS clause includes any expiration options, then thebigquery.tables.delete permission is also required.

CREATE SNAPSHOT TABLE statement

Creates atable snapshot based on asource table. The source table can be a table, atable clone, ora table snapshot.

Syntax

CREATESNAPSHOTTABLE[IFNOTEXISTS]table_snapshot_nameCLONEsource_table_name[FORSYSTEM_TIMEASOFtime_expression][OPTIONS(snapshot_option_list)]

Arguments

  • IF NOT EXISTS: If a table snapshot or othertable resourceexists with the same name, theCREATE statement has no effect.

  • table_snapshot_name: The name of the table snapshot that you want to create.The table snapshot name must be unique per dataset. SeeTable path syntax.

  • source_table_name: The name of the table that you want to snapshot or thetable snapshot that you want to copy. SeeTable path syntax.

    If the source table is a standard table, then BigQuery createsa table snapshot of the source table. If the source table is a table snapshot,then BigQuery creates a copy of the table snapshot.

  • FOR SYSTEM_TIME AS OF:Lets you select the version of the table that was current at the timespecified bytimestamp_expression. It can only be used when creating asnapshot of a table; it can't be used when making a copy of a table snapshot.

  • snapshot_option_list: Additional table snapshotcreation options such as alabel and an expirationtime.

Details

CREATE SNAPSHOT TABLE statements must comply with the following rules:

  • Only oneCREATE statement is allowed.
  • The source table must be one of the following:
    • A table
    • A table clone
    • A table snapshot
  • TheFOR SYSTEM_TIME AS OF clause can only be used when creating a snapshotof a table or table clone; it can't be used when making a copy of a tablesnapshot.

snapshot_option_list

The option list lets you set table snapshot options such as alabel and an expiration time. You can include multipleoptions using a comma-separated list.

Specify a table snapshot option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
expiration_timestampTIMESTAMP

Example:expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to theexpirationTime table resource property.

friendly_name

STRING

Example:friendly_name="my_table_snapshot"

This property is equivalent to thefriendlyName table resource property.

description

STRING

Example:description="A table snapshot that expires in 2025"

This property is equivalent to thedescription table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example:labels=[("org_unit", "development")]

This property is equivalent to thelabels table resource property.

tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

VALUE is a constant expression that contains only literals, query parameters,and scalar functions.

The constant expression cannot contain:

  • A reference to a table
  • Subqueries or SQL statements such asSELECT,CREATE, andUPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

IfVALUE evaluates toNULL, the corresponding optionNAME in theCREATE SNAPSHOT TABLE statement is ignored.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the tablesnapshot.
bigquery.tables.createSnapshotThe source table.
bigquery.tables.getThe source table.
bigquery.tables.getDataThe source table.

Examples

Create a table snapshot: fail if it already exists

The following example creates a table snapshot of the tablemyproject.mydataset.mytable. The table snapshot is created in the datasetmydataset and is namedmytablesnapshot:

CREATESNAPSHOTTABLE`myproject.mydataset.mytablesnapshot`CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="my_table_snapshot",description="A table snapshot that expires in 2 days",labels=[("org_unit","development")])

If the table snapshot name already exists in the dataset, then the followingerror is returned:

Already Exists: myproject.mydataset.mytablesnapshot

The table snapshot option list specifies the following:

  • Expiration time: 48 hours after the time the table snapshot is created
  • Friendly name:my_table_snapshot
  • Description:A table snapshot that expires in 2 days
  • Label:org_unit = development

Create a table snapshot: ignore if it already exists

The following example creates a table snapshot of the tablemyproject.mydataset.mytable. The table snapshot is created in the datasetmydataset and is namedmytablesnapshot:

CREATESNAPSHOTTABLEIFNOTEXISTS`myproject.mydataset.mytablesnapshot`CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="my_table_snapshot",description="A table snapshot that expires in 2 days"labels=[("org_unit","development")])

The table snapshot option list specifies the following:

  • Expiration time: 48 hours after the time the table snapshot is created
  • Friendly name:my_table_snapshot
  • Description:A table snapshot that expires in 2 days
  • Label:org_unit = development

If the table snapshot name alreadyexists in the dataset, then no action is taken, and no error is returned.

For information about restoring table snapshots, seeCREATE TABLE CLONE.

For information about removing table snapshots, seeDROP SNAPSHOT TABLE.

CREATE TABLE CLONE statement

Creates atable clone based on a sourcetable. The source table can be a table, a table clone,or atable snapshot.

Syntax

CREATETABLE[IFNOTEXISTS]destination_table_nameCLONEsource_table_name[FORSYSTEM_TIMEASOFtime_expression]...[OPTIONS(table_option_list)]

Details

Other than the use of theCLONE clause in place of a column list, the syntaxis identical to theCREATE TABLE syntax.

Arguments

  • IF NOT EXISTS: If the specified destination table name already exists, theCREATE statement has no effect.

  • destination_table_name: The name of the table that you want to create.The table name mustbe unique per dataset. The table name can contain the following:

    • Up to 1,024 characters
    • Letters (upper or lower case), numbers, and underscores
  • OPTIONS(table_option_list): Lets you specifyadditional table creation options such as alabel andan expiration time.

  • source_table_name: The name of the source table.

CREATE TABLE CLONE statements must comply with the following rules:

  • Only oneCREATE statement is allowed.
  • The table that is being cloned must be a table, a table clone, ora table snapshot.

OPTIONS

CREATE TABLE CLONE options are the same asCREATE TABLE options.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the table clone.
bigquery.tables.getThe source table.
bigquery.tables.getDataThe source table.
bigquery.tables.restoreSnapshotThe source table (required only if the source table is a table snapshot).

If theOPTIONS clause includes any expiration options, then thebigquery.tables.delete permission is also required.

Examples

Restore a table snapshot: fail if destination table already exists

The following example creates the tablemyproject.mydataset.mytable from the table snapshotmyproject.mydataset.mytablesnapshot:

CREATETABLE`myproject.mydataset.mytable`CLONE`myproject.mydataset.mytablesnapshot`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL365DAY),friendly_name="my_table",description="A table that expires in 1 year",labels=[("org_unit","development")])

If the table name exists in the dataset, then the following error isreturned:

Already Exists: myproject.mydataset.mytable.

The table option list specifies the following:

  • Expiration time: 365 days after the time that the table is created
  • Friendly name:my_table
  • Description:A table that expires in 1 year
  • Label:org_unit = development

Create a clone of a table: ignore if the destination table already exists

The following example creates the table clonemyproject.mydataset.mytableclone based on the tablemyproject.mydataset.mytable:

CREATETABLEIFNOTEXISTS`myproject.mydataset.mytableclone`CLONE`myproject.mydataset.mytable`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL365DAY),friendly_name="my_table",description="A table that expires in 1 year",labels=[("org_unit","development")])

The table option list specifies the following:

  • Expiration time: 365 days after the time the table is created
  • Friendly name:my_table
  • Description:A table that expires in 1 year
  • Label:org_unit = development

If the table name exists in the dataset, then no action is taken, and no erroris returned.

For information about creating a copy of a table, seeCREATE TABLE COPY.

For information about creating a snapshot of a table, seeCREATE SNAPSHOT TABLE.

CREATE VIEW statement

Creates a new view.

Syntax

CREATE[ORREPLACE]VIEW[IFNOTEXISTS]view_name[(view_column_name_list)][OPTIONS(view_option_list)]ASquery_expressionview_column_name_list:=view_column[,...]view_column:=column_name[OPTIONS(view_column_option_list)]

Arguments

  • OR REPLACE: Replaces any view with the same name if it exists. Cannotappear withIF NOT EXISTS.

  • IF NOT EXISTS: If a view or othertable resourceexists with the same name, theCREATE statement has no effect. Cannot appearwithOR REPLACE.

  • view_name: The name of the view you're creating. SeeTable path syntax.

  • view_column_name_list: Lets you explicitlyspecify the column names of the view, which may be aliases to the column namesin the underlying SQL query.

  • view_option_list: Additional view creation optionssuch as alabel and an expiration time.

  • query_expression: The GoogleSQL query expression used to define theview.

Details

CREATE VIEW statements must comply with the following rules:

  • Only oneCREATE statement is allowed.

view_column_name_list

The view's column name list is optional. The names must be unique but do not have to be the same as the column names of the underlying SQL query. For example, if your view is created with the following statement:

CREATEVIEWmydataset.age_groups(age,count)ASSELECTage,COUNT(*)FROMmydataset.peoplegroupbyage;

Then you can query it with:

SELECTage,countfrommydataset.age_groups;

The number of columns in the column name list must match the number of columns in the underlying SQL query. If the columns in the table of the underlying SQL query is added or dropped, the view becomes invalid and must be recreated. For example, if theage column is dropped from themydataset.people table, then the view created in the previous example becomes invalid.

view_column_option_list

Theview_column_option_list lets you specify optional top-level columnoptions. Column options for a view have the same syntax and requirements asfor a table, but with a different list ofNAME andVALUE fields:

NAMEVALUEDetails
description

STRING

Example:description="a unique id"

view_option_list

The option list allows you to set view options such as alabel and an expiration time. You can include multipleoptions using a comma-separated list.

Specify a view option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
expiration_timestampTIMESTAMP

Example:expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to theexpirationTime table resource property.

friendly_name

STRING

Example:friendly_name="my_view"

This property is equivalent to thefriendlyName table resource property.

description

STRING

Example:description="a view that expires in 2025"

This property is equivalent to thedescription table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example:labels=[("org_unit", "development")]

This property is equivalent to thelabels table resource property.

privacy_policy

JSON-formatted STRING

The policies to enforce when anyone queries the view. To learn more about the policies available for a view, see theprivacy_policy view option.

Note: Time travel is disabled on any view that has an analysis rule.
tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags for the view, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

  • A reference to a table
  • Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

IfVALUE evaluates toNULL, the corresponding optionNAME in theCREATE VIEW statement is ignored.

privacy_policy

The following policies are available in theprivacy_policy view optionto createanalysis rules. A policy representsa condition that needs to be met before a query can be run.

PolicyDetails

aggregation_threshold_policy

The aggregation threshold policy to enforce when a view is queried.

Syntax:

'{  "aggregation_threshold_policy": {    "threshold": value,    "privacy_unit_columns": value  }}'

Parameters:

  • aggregation_threshold_policy: An aggregation threshold policy for the view. When this parameter is included, a minimum number of distinct entities must be present in a set of data in the view.
  • threshold: The minimum number of distinct privacy units (privacy unit column values) that need to contribute to each row in the query results. If a potential row doesn't satisfy this threshold, that row is omitted from the query results.value is a positive JSON integer.
  • privacy_unit_columns: The columns that represents the privacy unit columns in a view. At this time, a view can have only one privacy unit column.value is a JSON string.

Example:

privacy_policy='{"aggregation_threshold_policy": {"threshold" : 50, "privacy_unit_columns": "ID"}}'

differential_privacy_policy

A differential privacy policy for the view. When this parameter is included, only differentially private queries can be run on the view.

Syntax:

'{  "differential_privacy_policy": {    "privacy_unit_column": value,    "max_epsilon_per_query": value,    "epsilon_budget": value,    "delta_per_query": value,    "delta_budget": value,    "max_groups_contributed": value  }}'

Parameters:

  • differential_privacy_policy: The differential privacy policy for the view.
  • privacy_unit_column: The column that represents the privacy unit column for differentially private queries on the view.value is a JSON string.
  • max_epsilon_per_query: The maximum amount of epsilon that can be specified for a differentially private query on the view.value is a JSON number from 0.001 to 1e+15.
  • epsilon_budget: The amount of epsilon that can be used in totality for all differentially private queries on the view.value is JSON number from 0.001 to 1e+15.
  • delta_per_query: The maximum amount of delta that can be specified for a differentially private query on the view.value is a JSON number from 1e-15 to 1.
  • delta_budget: The amount of delta that can be used in totality for all differentially private queries on the view. The budget must be larger than the delta for any differentially private query on the view.value is a JSON number from 1e-15 to `1000`.
  • max_groups_contributed: The maximum number of groups to which each protected entity can contribute in a differentially private query.value is a non-negative JSON integer.

Example:

privacy_policy='{"differential_privacy_policy": { "privacy_unit_column": "contributor_id", "max_epsilon_per_query": 0.01, "epsilon_budget": 25.6, "delta_per_query": 0.005, "delta_budget": 9.6, "max_groups_contributed": 2}}'

join_restriction_policy

A join restriction policy for the view. When this parameter is included, only the specified joins can be run on the specified columns in the view.

This policy can be used alone or with other policies, such as the aggregation threshold or differential privacy policy.

Syntax:

'{  "join_restriction_policy": {    "join_condition": value,    "join_allowed_columns": value  }}'

Parameters:

  • join_restriction_policy: The join restriction policy for the view.
  • join_condition: The type of join condition to enforce on the view.value can be one of the following JSON strings:
    • JOIN_ALL: All columns injoin_allowed_columns must be inner joined upon for this view to be queried.
    • JOIN_ANY: At least one column injoin_allowed_columns must be joined upon for this view to be queried.
    • JOIN_BLOCKED: This view can't be joined along any column. Don't setjoin_allowed_columns in this case. This can be used with all analysis rules except for thelist overlap analysis rule.
    • JOIN_NOT_REQUIRED: A join is not required to query this view. If a join is used, only the columns injoin_allowed_columns can be used. This can be used with all analysis rules except for thelist overlap analysis rule.
  • join_allowed_columns: A list of columns that can be part of a join operation.value is a JSON array.

Example:

privacy_policy='{"join_restriction_policy": { "join_condition": 'JOIN_ANY', "join_allowed_columns": ['col1', 'col2']}}'
Note: Time travel is disabled on any view that has a policy.

Default project in view body

If the view is created in the same project used to run theCREATE VIEWstatement, the view bodyquery_expression can reference entities withoutspecifying the project; the default project is the projectwhich owns the view. Consider the sample query below.

CREATEVIEWmyProject.myDataset.myViewASSELECT*FROManotherDataset.myTable;

After running the aboveCREATE VIEW query in the projectmyProject, you canrun the querySELECT * FROM myProject.myDataset.myView. Regardless of the project youchoose to run thisSELECT query, the referenced tableanotherDataset.myTableis always resolved against projectmyProject.

If the view is not created in the same project used to run theCREATE VIEWstatement, then all references in the view bodyquery_expression must bequalified with project IDs. For instance, the preceding sampleCREATE VIEW queryis invalid if it runs in a project different frommyProject.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the view.

In addition, theOR REPLACE clause requiresbigquery.tables.updatepermission.

If theOPTIONS clause includes an expiration time, then thebigquery.tables.delete permission is also required.

Examples

Creating a new view

The following example creates a view namednewview inmydataset:

CREATEVIEW`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`

If the view name exists in the dataset, the following error is returned:

Already Exists:project_id:dataset.table

The view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name:newview
  • Description:A view that expires in 2 days
  • Label:org_unit = development

Creating a view only if the view doesn't exist

The following example creates a view namednewview inmydataset only if noview namednewview exists inmydataset. If the view name exists in thedataset, no error is returned, and no action is taken.

CREATEVIEWIFNOTEXISTS`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`

The view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name:newview
  • Description:A view that expires in 2 days
  • Label:org_unit = development

Creating or replacing a view

The following example creates a view namednewview inmydataset, and ifnewview exists inmydataset, it is overwritten using the specified queryexpression.

CREATEORREPLACEVIEW`myproject.mydataset.newview`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="newview",description="a view that expires in 2 days",labels=[("org_unit","development")])ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`

The view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROMmyproject.mydataset.mytable

The view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name:newview
  • Description:A view that expires in 2 days
  • Label:org_unit = development

Creating a view with column descriptions

The following example creates a view namednewview inmydataset. This viewdefinition provides the column description for each column inmytable.You can rename columns from the original query.

CREATEVIEW`myproject.mydataset.newview`(column_1_new_nameOPTIONS(DESCRIPTION='Description of the column 1 contents'),column_2_new_nameOPTIONS(DESCRIPTION='Description of the column 2 contents'),column_3_new_nameOPTIONS(DESCRIPTION='Description of the column 3 contents'))ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`

CREATE MATERIALIZED VIEW statement

Creates a new materialized view.

Syntax

CREATE[ORREPLACE]MATERIALIZEDVIEW[IFNOTEXISTS]materialized_view_name[PARTITIONBYpartition_expression][CLUSTERBYclustering_column_list][OPTIONS(materialized_view_option_list)]ASquery_expression

Arguments

  • OR REPLACE: Replaces a materialized view with the same name if it exists.Cannot appear withIF NOT EXISTS.

  • IF NOT EXISTS: If a materialized view or othertable resourceexists with the same name, theCREATE statement has no effect. Cannot appearwithOR REPLACE.

  • materialized_view_name: The name of the materialized view you're creating.SeeTable path syntax.

    If theproject_name is omitted from the materialized view name, or it is thesame as the project that runs this DDL query, then the latter is also used asthe default project for references to tables, functions, and other resourcesinquery_expression. The default project of the references is fixed and doesnot depend on the future queries that invoke the new materialized view.Otherwise, all references inquery_expression must be qualified withproject names.

    The materialized view name must be unique per dataset.

  • partition_expression: An expression that determineshow to partition the table. A materialized view can only be partitioned in thesame way as the table inquery expression (thebase table) is partitioned.

  • clustering_column_list: A comma-separated listof column references that determine how to cluster the materialized view.

  • materialized_view_option_list: Allows youto specify additional materialized view options such as a whether refresh isenabled, the refresh interval, alabel, and anexpiration time.

  • query_expression: The GoogleSQL query expression used to define thematerialized view.

Details

CREATE MATERIALIZED VIEW statements must comply with the following rules:

  • Only oneCREATE statement is allowed.

Default project in materialized view body

If the materialized view is created in the same project used to run theCREATE MATERIALIZED VIEWstatement, the materialized view bodyquery_expression can reference entities withoutspecifying the project; the default project is the projectwhich owns the materialized view. Consider the sample query below.

CREATEMATERIALIZEDVIEWmyProject.myDataset.myViewASSELECT*FROManotherDataset.myTable;

After running the aboveCREATE MATERIALIZED VIEW query in the projectmyProject, you canrun the querySELECT * FROM myProject.myDataset.myView. Regardless of the project youchoose to run thisSELECT query, the referenced tableanotherDataset.myTableis always resolved against projectmyProject.

If the materialized view is not created in the same project used to run theCREATE VIEWstatement, then all references in the materialized view bodyquery_expression must bequalified with project IDs. For instance, the preceding sampleCREATE MATERIALIZED VIEW queryis invalid if it runs in a project different frommyProject.

materialized_view_option_list

The option list allows you to set materialized view options such as a whetherrefresh is enabled. the refresh interval, alabel andan expiration time. You can include multiple options using a comma-separatedlist.

Specify a materialized view option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
enable_refreshBOOLEAN

Example:enable_refresh=false
Default:true

refresh_interval_minutesFLOAT64

Example:refresh_interval_minutes=20
Default:refresh_interval_minutes=30

expiration_timestampTIMESTAMP

Example:expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to theexpirationTime table resource property.expiration_timestamp is optional and not used by default.

max_stalenessINTERVAL

Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND

Themax_staleness property provides consistently high performance with controlled costs when processing large, frequently changing datasets.max_staleness is disabled by default.

allow_non_incremental_definitionBOOLEAN

Example:allow_non_incremental_definition=true

Theallow_non_incremental_definition property supports an expanded range of SQL queries to create materialized views.allow_non_incremental_definition=true is disabled by default.CREATE MATERIALIZED VIEW statement support only. Theallow_non_incremental_definition property can't be changed after the materialized view is created.

kms_key_name

STRING

Example:kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to theencryptionConfiguration.kmsKeyName table resource property.

See more details aboutProtecting data with Cloud KMS keys.

friendly_name

STRING

Example:friendly_name="my_mv"

This property is equivalent to thefriendlyName table resource property.

description

STRING

Example:description="a materialized view that expires in 2025"

This property is equivalent to thedescription table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example:labels=[("org_unit", "development")]

This property is equivalent to thelabels table resource property.

tagsARRAY<STRUCT<STRING, STRING>>An array of IAM tags for the materialized view, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the materializedview.

In addition, theOR REPLACE clause requiresbigquery.tables.updatepermission.

If theOPTIONS clause includes any expiration options, then thebigquery.tables.delete permission is also required.

Examples

Creating a new materialized view

The following example creates a materialized view namednew_mv inmydataset:

CREATEMATERIALIZEDVIEW`myproject.mydataset.new_mv`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="new_mv",description="a materialized view that expires in 2 days",labels=[("org_unit","development")],enable_refresh=true,refresh_interval_minutes=20)ASSELECTcolumn_1,SUM(column_2)ASsum_2,AVG(column_3)ASavg_3FROM`myproject.mydataset.mytable`GROUPBYcolumn_1

If the materialized view name exists in the dataset, the following error isreturned:

Already Exists:project_id:dataset.materialized_view

When you use a DDL statement to create a materialized view, you must specify theproject, dataset, and materialized view in the following format:`project_id.dataset.materialized_view`(including the backticks ifproject_id contains special characters); for example,`myproject.mydataset.new_mv`.

The materialized view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view option list specifies the:

  • Expiration time: 48 hours from the time the materialized view is created
  • Friendly name:new_mv
  • Description:A materialized view that expires in 2 days
  • Label:org_unit = development
  • Refresh enabled: true
  • Refresh interval: 20 minutes

Creating a materialized view only if the materialized view doesn't exist

The following example creates a materialized view namednew_mv inmydatasetonly if no materialized view namednew_mv exists inmydataset. If thematerialized view name exists in the dataset, no error is returned, and noaction is taken.

CREATEMATERIALIZEDVIEWIFNOTEXISTS`myproject.mydataset.new_mv`OPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL48HOUR),friendly_name="new_mv",description="a view that expires in 2 days",labels=[("org_unit","development")],enable_refresh=false)ASSELECTcolumn_1,column_2,column_3FROM`myproject.mydataset.mytable`

The materialized view is defined using the following GoogleSQL query:

SELECT column_1, column_2, column_3 FROM `myproject.mydataset.mytable`

The materialized view option list specifies the:

  • Expiration time: 48 hours from the time the view is created
  • Friendly name:new_mv
  • Description:A view that expires in 2 days
  • Label:org_unit = development
  • Refresh enabled: false

Creating a materialized view with partitioning and clustering

The following example creates a materialized view namednew_mv inmydataset,partitioned by thecol_datetime column and clusteredby thecol_int column:

CREATEMATERIALIZEDVIEW`myproject.mydataset.new_mv`PARTITIONBYDATE(col_datetime)CLUSTERBYcol_intASSELECTcol_int,col_datetime,COUNT(1)ascntFROM`myproject.mydataset.mv_base_table`GROUPBYcol_int,col_datetime

The base table,mv_base_table, must also be partitioned by thecol_datetime column. For more information, seeWorking with partitioned and clustered tables.

CREATE MATERIALIZED VIEW AS REPLICA OF statement

Creates areplica of a materialized view.The source materialized view mustbe over an Amazon Simple Storage Service (Amazon S3) BigLake table. You can use the materializedview replica to make Amazon S3 data available locally for joins.

For more information, seeCreate materialized view replicas.

Syntax

CREATEMATERIALIZEDVIEWreplica_name[OPTIONS(materialized_view_replica_option_list)]ASREPLICAOFsource_materialized_view_name

Arguments

  • replica_name: The name of the materialized view replica you're creating, intable path syntax. If the project name is omitted from thematerialized view replica name, the current project is used as the default.

    The materialized view replica name must be unique for each dataset.

  • materialized_view_replica_option_list:Allows you to specify options such as the replication interval.

  • source_materialized_view_name: The name of the materialized view you arereplicating, in table path syntax. The source materialized view must be overan Amazon S3 BigLake table, and must be authorized on thedataset that contains that table.

materialized_view_replica_option_list

The option list lets you set materialized view replica options.

Specify a materialized view replica option list in the following format:

NAME=VALUE, ...

NAMEVALUEDetails
replication_interval_secondsINT64

Specifies how often to replicate the data from the source materialized view to the replica. Must be a value between60 and3,600, inclusive. Defaults to300 (5 minutes).

Example:replication_interval_seconds=900

Required permissions

This statement requires the followingIAM permissions:

  • bigquery.tables.create
  • bigquery.tables.get
  • bigquery.tables.getData
  • bigquery.tables.replicateData
  • bigquery.jobs.create

Example

The following example creates a materialized view replica namedmv_replicainbq_dataset:

CREATEMATERIALIZEDVIEW`myproject.bq_dataset.mv_replica`OPTIONS(replication_interval_seconds=600)ASREPLICAOF`myproject.s3_dataset.my_s3_mv`

CREATE EXTERNAL SCHEMA statement

Creates a new federated dataset.

A federated dataset is a connection between BigQuery and anexternal data source at the dataset level. For more information about creatingfederated datasets, see the following:

Syntax

CREATEEXTERNALSCHEMA[IFNOTEXISTS]dataset_name[WITHCONNECTIONconnection_name][OPTIONS(external_schema_option_list)]

Arguments

  • IF NOT EXISTS: If any dataset exists with the same name, theCREATEstatement has no effect.

  • dataset_name: The name of the dataset to create.

  • connection_name: Specifies aconnection resource that hascredentials for accessing the external data. Specify the connection namein the formPROJECT_ID.LOCATION.CONNECTION_ID. If theproject ID or location contains a dash, enclose the connection name inbackticks (`).

  • external_schema_option_list: A list ofoptions for creating the federated dataset.

Details

The dataset is created in the location that you specify in the query settings.For more information, seeSpecify locations.The location must support the kind of federated dataset that you are creating,for example, you can only create AWS Glue federated datasets in AWSlocations.

For more information about creating a dataset, seeCreate datasets. For information about quotas, seedataset limits.

external_schema_option_list

The option list specifies options for the federated dataset. Specify the optionsin the following format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
descriptionSTRINGThe description of the dataset.
friendly_nameSTRINGA descriptive name for the dataset.
labels<ARRAY<STRUCT<STRING, STRING>>>An array of labels for the dataset, expressed as key-value pairs.
locationSTRINGThe location in which to create the dataset. If you don't specify this option, the dataset is created in the location where the query runs. If you specify this option and also explicitly set the location for the query job, the two values must match; otherwise the query fails. The location must support the kind of federated dataset that you are creating, for example, you can only create AWS Glue federated datasets in AWS locations.
external_sourceSTRINGThe source of the external dataset. For AWS Glue federated datasets this must be anAmazon Resource Name (ARN), with a prefix identifying the source, such asaws-glue://. For Spanner federated datasets, this must be a specific Spanner database with agoogle-cloudspanner:/ prefix. For example:google-cloudspanner:/projects/my_project/instances/my_instance/databases/my_database.
tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.datasets.createThe project where you create the federated dataset.
bigquery.connections.useThe project where you create the federated dataset.
bigquery.connections.delegateThe project where you create the federated dataset.

Examples

The following example creates an AWS Glue federated dataset:

CREATEEXTERNALSCHEMAmydatasetWITHCONNECTIONmyproject.`aws-us-east-1`.myconnectionOPTIONS(external_source='aws-glue://arn:aws:glue:us-east-1:123456789:database/test_database',location='aws-us-east-1');

CREATE EXTERNAL TABLE statement

Creates a new external table.

External tables let BigQuery query data that is stored outside ofBigQuery storage. For more information about external tables, seeIntroduction to external data sources.

Syntax

CREATE[ORREPLACE]EXTERNALTABLE[IFNOTEXISTS]table_name[(column_namecolumn_schema,...)][WITHCONNECTION{connection_name|DEFAULT}][WITHPARTITIONCOLUMNS[(partition_column_namepartition_column_type,...)]]OPTIONS(external_table_option_list,...);

Arguments

  • OR REPLACE: Replaces any external table with the same name if it exists.Cannot appear withIF NOT EXISTS.

  • IF NOT EXISTS: If an external table or othertable resourceexists with the same name, theCREATE statement has no effect. Cannot appearwithOR REPLACE.

  • table_name: The name of the external table. SeeTable path syntax.

  • column_name: The name of a column in the table.

  • column_schema: Specifies the schema of the column. It uses the samesyntax as thecolumn_schema definition intheCREATE TABLE statement. If you don't includethis clause, BigQuery detects the schema automatically.

  • connection_name: Specifies aconnection resource that hascredentials for accessing the external data. Specify the connection namein the formPROJECT_ID.LOCATION.CONNECTION_ID. If theproject ID or location contains a dash, enclose the connection name inbackticks (`). To use adefault connection,specifyDEFAULT instead of the connection string containingPROJECT_ID.LOCATION.CONNECTION_ID.

  • partition_column_name: The name of a partition column. Include thisfield if your external data uses a hive-partitioned layout. For moreinformation, see:Supported data layouts.

  • partition_column_type: The partition column type.

  • external_table_option_list: A list of optionsfor creating the external table.

Details

TheCREATE EXTERNAL TABLE statement does not support creating temporaryexternal tables.

To create an externally partitioned table, use theWITH PARTITION COLUMNSclause to specify the partition schema details. BigQueryvalidates the column definitions against the external data location. The schemadeclaration must strictly follow the ordering of the fields in the externalpath. For more information about external partitioning, seeQuerying externally partitioned data.

external_table_option_list

The option list specifies options for creating the external table. Theformatanduris options are required. Specify the option list in the followingformat:NAME=VALUE, ...

Options
allow_jagged_rows

BOOL

Iftrue, allow rows that are missing trailing optional columns.

Applies to CSV data.

allow_quoted_newlines

BOOL

Iftrue, allow quoted data sections that contain newline characters in the file.

Applies to CSV data.

bigtable_options

STRING

Only required when creating a Bigtable external table.

Specifies the schema of the Bigtable external table in JSON format.

For a list of Bigtable table definition options, seeBigtableOptions in the REST API reference.

compression

STRING

The compression type of the data source. Supported values include:GZIP. If not specified, the data source is uncompressed.

Applies to CSV and JSON data.

decimal_target_types

ARRAY<STRING>

Determines how to convert aDecimal type. Equivalent toExternalDataConfiguration.decimal_target_types

Example:["NUMERIC", "BIGNUMERIC"].

description

STRING

A description of this table.

enable_list_inference

BOOL

Iftrue, use schema inference specifically for Parquet LIST logical type.

Applies to Parquet data.

enable_logical_types

BOOL

Iftrue, convert Avro logical types into their corresponding SQL types. For more information, see Logical types.

Applies to Avro data.

encoding

STRING

The character encoding of the data. Supported values include:UTF8 (orUTF-8),ISO_8859_1 (orISO-8859-1),UTF-16BE,UTF-16LE,UTF-32BE, orUTF-32LE. The default value isUTF-8.

Applies to CSV data.

enum_as_string

BOOL

Iftrue, infer Parquet ENUM logical type as STRING instead of BYTES by default.

Applies to Parquet data.

expiration_timestamp

TIMESTAMP

The time when this table expires. If not specified, the table does not expire.

Example:"2025-01-01 00:00:00 UTC".

field_delimiter

STRING

The separator for fields in a CSV file.

Applies to CSV data.

format

STRING

The format of the external data. Supported values forCREATE EXTERNAL TABLE include:AVRO,CLOUD_BIGTABLE,CSV,DATASTORE_BACKUP,DELTA_LAKE (preview),GOOGLE_SHEETS,NEWLINE_DELIMITED_JSON (orJSON),ORC,PARQUET.

Supported values forLOAD DATA include:AVRO,CSV,DELTA_LAKE (preview)NEWLINE_DELIMITED_JSON (orJSON),ORC,PARQUET.

The valueJSON is equivalent toNEWLINE_DELIMITED_JSON.

hive_partition_uri_prefix

STRING

A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables.

Applies to Avro, CSV, JSON, Parquet, and ORC data.

Example:"gs://bucket/path".

file_set_spec_type

STRING

Specifies how to interpret source URIs for load jobs and external tables.

Supported values include:

  • FILE_SYSTEM_MATCH. Expands source URIs by listing files from the object store. This is the default behavior if FileSetSpecType is not set.
  • NEW_LINE_DELIMITED_MANIFEST. Indicates that the provided URIs are newline-delimited manifest files, with one URI per line. Wildcard URIs are not supported in the manifest files, and all referenced data files must be in the same bucket as the manifest file.

For example, if you have a source URI of"gs://bucket/path/file" and thefile_set_spec_type isFILE_SYSTEM_MATCH, then the file is used directly as a data file. If thefile_set_spec_type isNEW_LINE_DELIMITED_MANIFEST, then each line in the file is interpreted as a URI that points to a data file.

ignore_unknown_values

BOOL

Iftrue, ignore extra values that are not represented in the table schema, without returning an error.

Applies to CSV and JSON data.

json_extension

STRING

For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records.

Supported values include:
GEOJSON. Newline-delimited GeoJSON data. For more information, seeCreating an external table from a newline-delimited GeoJSON file.

max_bad_records

INT64

The maximum number of bad records to ignore when reading the data.

Applies to: CSV, JSON, and Google Sheets data.

max_staleness

INTERVAL

Applicable forBigLake tables andobject tables.

Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it.

To disable metadata caching, specify 0. This is the default.

To enable metadata caching, specify aninterval literal value between 30 minutes and 7 days. For example, specifyINTERVAL 4 HOUR for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation falls back to retrieving metadata from Cloud Storage instead.

null_marker

STRING

The string that representsNULL values in a CSV file.

Applies to CSV data.

null_markers

ARRAY<STRING>

(Preview)

The list of strings that representNULL values in a CSV file.

This option cannot be used withnull_marker option.

Applies to CSV data.

object_metadata

STRING

Only required when creating anobject table.

Set the value of this option toSIMPLE when creating an object table.

preserve_ascii_control_characters

BOOL

Iftrue, then the embedded ASCII control characters which are the first 32 characters in the ASCII table, ranging from '\x00' to '\x1F', are preserved.

Applies to CSV data.

projection_fields

STRING

A list of entity properties to load.

Applies to Datastore data.

quote

STRING

The string used to quote data sections in a CSV file. If your data contains quoted newline characters, also set theallow_quoted_newlines property totrue.

Applies to CSV data.

reference_file_schema_uri

STRING

User provided reference file with the table schema.

Applies to Parquet/ORC/AVRO data.

Example:"gs://bucket/path/reference_schema_file.parquet".

require_hive_partition_filter

BOOL

Iftrue, all queries over this table require a partition filter that can be used to eliminate partitions when reading data. Applies only to hive-partitioned external tables.

Applies to Avro, CSV, JSON, Parquet, and ORC data.

sheet_range

STRING

Range of a Google Sheets spreadsheet to query from.

Applies to Google Sheets data.

Example:"sheet1!A1:B20",

skip_leading_rows

INT64

The number of rows at the top of a file to skip when reading the data.

Applies to CSV and Google Sheets data.

source_column_match

STRING

(Preview)

This controls the strategy used to match loaded columns to the schema.

If this value is unspecified, then the default is based on how the schema is provided. If autodetect is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position. This is done to keep the behavior backward-compatible.

Supported values include:

  • POSITION: matches by position. This option assumes that the columns are ordered the same way as the schema.
  • NAME: matches by name. This option reads the header row as column names and reorders columns to match the field names in the schema. Column names are read from the last skipped row based on theskip_leading_rows property.
tags<ARRAY<STRUCT<STRING, STRING>>>

An array of IAM tags for the table, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

time_zone

STRING

(Preview)

Default time zone that will apply when parsing timestamp values that have no specific time zone.

Checkvalid time zone names.

If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC.

Applies to CSV and JSON data.

date_format

STRING

(Preview)

Format elements that define how the DATE values are formatted in the input files (for example,MM/DD/YYYY).

If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format.

If this value is not present, the DATE field is parsed with thedefault formats.

Applies to CSV and JSON data.

datetime_format

STRING

(Preview)

Format elements that define how the DATETIME values are formatted in the input files (for example,MM/DD/YYYY HH24:MI:SS.FF3).

If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format.

If this value is not present, the DATETIME field is parsed with thedefault formats.

Applies to CSV and JSON data.

time_format

STRING

(Preview)

Format elements that define how the TIME values are formatted in the input files (for example,HH24:MI:SS.FF3).

If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format.

If this value is not present, the TIME field is parsed with thedefault formats.

Applies to CSV and JSON data.

timestamp_format

STRING

(Preview)

Format elements that define how the TIMESTAMP values are formatted in the input files (for example,MM/DD/YYYY HH24:MI:SS.FF3).

If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format.

If this value is not present, the TIMESTAMP field is parsed with thedefault formats.

Applies to CSV and JSON data.

uris

For external tables, including object tables, that aren't Bigtable tables:

ARRAY<STRING>

An array of fully qualified URIs for the external data locations. Each URI can contain one asterisk (*)wildcard character, which must come after the bucket name. When you specifyuris values that target multiple files, all of those files must share a compatible schema.

The following examples show validuris values:

  • ['gs://bucket/path1/myfile.csv']
  • ['gs://bucket/path1/*.csv']
  • ['gs://bucket/path1/*', 'gs://bucket/path2/file00*']

For Bigtable tables:

STRING

The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI.

Example:https://googleapis.com/bigtable/projects/project_id/instances/instance_id[/appProfiles/app_profile]/tables/table_name

For more information on constructing a Bigtable URI, seeRetrieve the Bigtable URI.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createThe dataset where you create the external table.

In addition, theOR REPLACE clause requiresbigquery.tables.updatepermission.

If theOPTIONS clause includes an expiration time, then thebigquery.tables.delete permission is also required.

Examples

The following example creates a BigLake table and explicitlyspecifies the schema. It also specifies refreshing metadata cache automaticallyat a system-defined interval.

CREATEORREPLACEEXTERNALTABLEmydataset.newtable(xINT64,ySTRING,zBOOL)WITHCONNECTIONmyconnectionOPTIONS(format="PARQUET",max_staleness=STALENESS_INTERVAL,metadata_cache_mode='AUTOMATIC');

The following example creates an external table from multiple URIs. The dataformat is CSV. This example uses schema auto-detection.

CREATEEXTERNALTABLEdataset.CsvTableOPTIONS(format='CSV',uris=['gs://bucket/path1.csv','gs://bucket/path2.csv']);

The following example creates an external table from a CSV file and explicitlyspecifies the schema. It also specifies the field delimiter ('|') and sets themaximum number of bad records allowed.

CREATEORREPLACEEXTERNALTABLEdataset.CsvTable(xINT64,ySTRING)OPTIONS(format='CSV',uris=['gs://bucket/path1.csv'],field_delimiter='|',max_bad_records=5);

The following example creates an externally partitioned table. It uses schema auto-detection to detect both the file schema and the hive partitioning layout. If the external path isgs://bucket/path/field_1=first/field_2=1/data.parquet, the partition columns are detected asfield_1 (STRING) andfield_2 (INT64).

CREATEEXTERNALTABLEdataset.AutoHivePartitionedTableWITHPARTITIONCOLUMNSOPTIONS(uris=['gs://bucket/path/*'],format='PARQUET',hive_partition_uri_prefix='gs://bucket/path',require_hive_partition_filter=false);

The following example creates an externally partitioned table by explicitly specifying the partition columns. This example assumes that the external file path has the patterngs://bucket/path/field_1=first/field_2=1/data.parquet.

CREATEEXTERNALTABLEdataset.CustomHivePartitionedTableWITHPARTITIONCOLUMNS(field_1STRING,-- column order must match the external pathfield_2INT64)OPTIONS(uris=['gs://bucket/path/*'],format='PARQUET',hive_partition_uri_prefix='gs://bucket/path',require_hive_partition_filter=false);

CREATE FUNCTION statement

Creates a newuser-defined function(UDF). BigQuery supports UDFs written in SQL, JavaScript, orPython.

Syntax

To create a SQL UDF, use the following syntax:

CREATE[ORREPLACE][TEMPORARY|TEMP]FUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([named_parameter[,...]])([named_parameter[,...]])[RETURNSdata_type]AS(sql_expression)[OPTIONS(function_option_list)]named_parameter:param_nameparam_type

To create a JavaScript UDF, use the following syntax:

CREATE[ORREPLACE][TEMPORARY|TEMP]FUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([named_parameter[,...]])RETURNSdata_type[determinism_specifier]LANGUAGEjs[OPTIONS(function_option_list)]ASjavascript_codenamed_parameter:param_nameparam_typedeterminism_specifier:{DETERMINISTIC|NOTDETERMINISTIC}

To create a Python UDF, use the following syntax:

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: For support during the preview, emailbq-python-udf-feedback@google.com.
CREATE[ORREPLACE]FUNCTION[IFNOTEXISTS][project_name.]dataset_name.function_name([named_parameter[,...]])RETURNSdata_typeLANGUAGEpython[WITHCONNECTIONconnection_path]OPTIONS(function_option_list)ASpython_codenamed_parameter:param_nameparam_type

To create a remote function, use the following syntax:

CREATE[ORREPLACE][TEMPORARY|TEMP]FUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([named_parameter[,...]])RETURNSdata_typeREMOTEWITHCONNECTIONconnection_path[OPTIONS(function_option_list)]named_parameter:param_nameparam_type

Routine names must contain only letters, numbers, and underscores, and be at most 256 characters long.

Arguments

  • OR REPLACE: Replaces any function with the same name if it exists. Cannotappear withIF NOT EXISTS.

  • IF NOT EXISTS: If any dataset exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.

  • TEMP orTEMPORARY: Creates a temporary function. If the clause is notpresent, the statement creates a persistent UDF. You can reuse persistentUDFs across multiple queries, whereas you can only use temporary UDFs in asingle query, script, session, or procedure.

  • project_name: For persistent functions, the name of the project whereyou are creating the function. Defaults to the project that runs the DDLquery. Do not include the project name for temporary functions.

  • dataset_name: For persistent functions, the name of the dataset whereyou are creating the function. Defaults to thedefaultDataset in therequest. Do not include the dataset name for temporary functions.

  • function_name: The name of the function.

  • named_parameter: A comma-separatedparam_nameandparam_type pair. The value ofparam_type is aBigQuerydata type. For a SQL UDF, thevalue ofparam_type can also beANY TYPE.

  • determinism_specifier: Applies only to JavaScript UDFs.Provides a hint to BigQuery as to whether the query result canbe cached. Can be one of the following values:

    • DETERMINISTIC: The function always returns the same result when passedthe same arguments. The query result is potentially cacheable. Forexample, if the functionadd_one(i) always returnsi + 1, thefunction is deterministic.

    • NOT DETERMINISTIC: The function does not always return the same resultwhen passed the same arguments, and therefore is not cacheable. Forexample, if the functionjadd_random(i) returnsi + rand(), thefunction is not deterministic and BigQuery does not usecached results.

      If all of the invoked functions areDETERMINISTIC,BigQuery tries to cache the result, unless the resultscan't be cached for other reasons. For more information, seeUsing cached query results.

  • data_type: The data type that the function returns.

    • If the function is defined in SQL, then theRETURNS clause isoptional. If theRETURNS clause is omitted, then BigQueryinfers the result type of the function from the SQL function body when aquery calls the function.

    • If the function is defined in JavaScript, then theRETURNS clause isrequired. For more information about allowed values fordata_type,seeSupportedJavaScript UDF data types.

  • sql_expression: The SQL expression that defines the function.

  • function_option_list: A list of options forcreating the function.

  • javascript_code: The definition of a JavaScript function. The value is astring literal.If the code includes quotes and backslashes, it must be either escaped orrepresented as a raw string. For example, the codereturn "\n"; can berepresented as one of the following:

    • Quoted string"return \"\\n\";". Both quotes and backslashes needto be escaped.
    • Triple quoted string:"""return "\\n";""". Backslashes needto be escaped, quotes don't.
    • Raw string:r"""return "\n";""". No escaping is needed.
  • python_code: The definition of a Python function. The value is astring literal.If the code includes quotes and backslashes, it must be escaped orrepresented as a raw string. For example, the codereturn "\n"; can berepresented as one of the following:

    • Quoted string:"return \"\\n\";". Both quotes and backslashes needto be escaped.
    • Triple quoted string:"""return "\\n";""". Backslashes needto be escaped, quotes don't.
    • Raw string:r"""return "\n";""". No escaping is needed.
  • connection_name: Specifies aconnection resource that hascredentials for accessing the remote endpoint or for running Python code.Specify the connection name in the formproject_name.location.connection_id: If theproject name or location contains a dash, enclose the connection name inbackticks (`).

function_option_list

The option list specifies options for creating a UDF. The following options aresupported:

NAMEVALUEDetails
description

STRING

A description of the UDF. This option isn't supported when creating a temporary function.
library

ARRAY <STRING>

An array of JavaScript libraries to include in the function definition. Applies only to JavaScript and Python UDFs. For more information, seeIncluding JavaScript libraries.

Example:["gs://my-bucket/lib1.js", "gs://my-bucket/lib2.js"]

endpoint

STRING

An HTTP endpoint of Cloud Functions. Applies only to remote functions.

Example:"https://us-east1-your-project.cloudfunctions.net/foo"

For more information, see Create a remote function.

user_defined_context

ARRAY <STRUCT <STRING,STRING>>

A list of key-value pairs that will be sent with every HTTP request when the function is invoked. Applies only to remote functions.

Example:[("key1","value1"),("key2", "value2")]

max_batching_rows

INT64

The maximum number of rows in each HTTP request. If not specified, BigQuery decides how many rows are included in a HTTP request. Applies only to remote functions and Python UDFs.

runtime_version

STRING

The name of the runtime version to run provided Python code. Applies only to Python UDFs. Example:python-3.11

entry_point

STRING

The name of the function defined in Python code as the entry point when the Python UDF is invoked. Applies only to Python UDFs.

packages

ARRAY<STRING>

An array of Python packages to install in the function definition. Applies only to Python UDFs. For more information, see Use third party packages.

Example:["pandas>=2.1", "google-cloud-translate==3.11"]

container_cpu

DOUBLE

Amount of CPU provisioned for a Python UDF container instance. Applied only to Python UDFs. For more information, see Configure container limits for Python UDF.

container_memory

STRING

Amount of memory provisioned for a Python UDF container instance. Applies only to Python UDFs. For more information, see Configure container limits for Python UDF.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.routines.createThe dataset where you create the function.

In addition, theOR REPLACE clause requiresbigquery.routines.updatepermission.

To create a remote function, additionalIAM permissions are needed:

PermissionResource
bigquery.connections.delegateThe connection which you use to create theremote function.

Examples

Create a SQL UDF

The following example creates a persistent SQL UDF namedmultiplyInputs ina dataset namedmydataset.

CREATEFUNCTIONmydataset.multiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64AS(x*y);

Create a JavaScript UDF

The following example creates a temporary JavaScript UDF namedmultiplyInputsand calls it from inside aSELECT statement.

CREATETEMPFUNCTIONmultiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64LANGUAGEjsASr"""  return x*y;""";SELECTmultiplyInputs(a,b)FROM(SELECT3asa,2asb);

Create a remote function

The following example creates a temporary remote function namedtempRemoteMultiplyInputs inUS location, using a connection calledmyconnection in the 'US' region.

CREATETEMPFUNCTIONtempRemoteMultiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64REMOTEWITHCONNECTIONus.myconnectionOPTIONS(endpoint="https://us-central1-myproject.cloudfunctions.net/multiply");

The following example creates a persistent remote function namedremoteMultiplyInputs in a dataset namedmydataset using a connection calledmyconnection. The location and project of the dataset and the connection mustmatch.

CREATEFUNCTIONmydataset.remoteMultiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64REMOTEWITHCONNECTIONus.myconnectionOPTIONS(endpoint="https://us-central1-myproject.cloudfunctions.net/multiply");

Create a Python UDF

The following example creates a Python UDF namedmultiplyInputs.

CREATEFUNCTIONmydataset.multiplyInputs(xFLOAT64,yFLOAT64)RETURNSFLOAT64LANGUAGEpythonOPTIONS(entry_point='multiply',runtime_version='python-3.11'packages=['pandas==2.2'])ASr"""import pandas as pddef multiply(df: pd.DataFrame):  return df['x'] * df['y']""";

CREATE AGGREGATE FUNCTION statement (SQL)

Creates a new SQLuser-defined aggregate function(UDAF).

Syntax

To create a SQL UDAF, use the following syntax:

CREATE[ORREPLACE][{TEMPORARY|TEMP}]AGGREGATEFUNCTION[IFNOTEXISTS]function_path([function_parameter[,...]])[RETURNSdata_type]AS(sql_function_body)[OPTIONS(function_option_list)]function_path:[[project_name.]dataset_name.]function_namefunction_parameter:parameter_namedata_type[NOTAGGREGATE]

Arguments

  • OR REPLACE: Replaces any function with the same name if it exists.OR REPLACE can't appear withIF NOT EXISTS.
  • IF NOT EXISTS: If any dataset exists with the same name, theCREATE statement has no effect.IF NOT EXISTS can't appear withOR REPLACE.
  • TEMP orTEMPORARY: The function is temporary; that is, it exists for the lifetime of a single query, script, session, or procedure. A temporary function can't have the same name as a built-in function. If the names match, an error is produced. IfTEMP orTEMPORARY is not included, a persistent function is created. You can reuse persistent functions across multiple queries.
  • function_path: The path where the function must be created and the name of the function.
    • project_name: For persistent functions, the name of the project where you are creating the function. Defaults to the project that runs the DDL query. Don't include the project name for temporary functions.
    • dataset_name: For persistent functions, the name of the dataset where you are creating the function. Defaults todefaultDataset in the request. Don't include the dataset name for temporary functions.
    • function_name: The name of the function. Function names must contain only letters, numbers, and underscores, and be at most 256 characters long.
  • function_parameter: A parameter for the function.
    • parameter_name: The name of the function parameter.
    • parameter_data_type: The GoogleSQLdata type for the function parameter.
    • NOT AGGREGATE: The function parameter is not an aggregate. A non-aggregate function parameter can appear anywhere in the function definition.
  • return_data_type: The GoogleSQL data type that the function should return. GoogleSQL infers the result data type of the function from the function body when theRETURN clause is omitted.
  • function_body: The SQL expression that defines the function body.
  • function_option_list: A list of options for creating the function. For more information, seefunction_option_list.

function_option_list

The option list specifies options for creating a SQL UDAF. The followingoptions are supported:

NAMEVALUEDetails
description

STRING

A description of the UDAF.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.routines.createThe dataset where you create the function.

In addition, theOR REPLACE clause requires thebigquery.routines.updatepermission.

Examples

Create and call a SQL UDAF

The following example shows a persistent SQL UDAF that includes anon-aggregate function parameter. Inside the function definition, theaggregateSUM method takes the aggregate function parameter dividend,while the non-aggregate division operator (/ ) takes thenon-aggregate function parameter divisor.

CREATEAGGREGATEFUNCTIONmyProject.myDataset.ScaledSum(dividendFLOAT64,divisorFLOAT64NOTAGGREGATE)RETURNSFLOAT64AS(SUM(dividend)/divisor);-- Call the SQL UDAF.SELECTScaledSum(col1,2)ASscaled_sumFROM(SELECT1AScol1UNIONALLSELECT3AScol1UNIONALLSELECT5AScol1);/*------------* | scaled_sum | +------------+ | 4.5        | *------------*/

CREATE AGGREGATE FUNCTION statement (JavaScript)

Creates a newJavaScript user-defined aggregate function(UDAF).

Syntax

To create a JavaScript UDAF, use the following syntax:

CREATE[ORREPLACE][{TEMPORARY|TEMP}]AGGREGATEFUNCTION[IFNOTEXISTS]function_path([function_parameter[,...]])RETURNSreturn_data_typeLANGUAGEjs[OPTIONS(function_option_list)]ASfunction_bodyfunction_path:[[project_name.]dataset_name.]function_namefunction_parameter:parameter_nameparameter_data_type[NOTAGGREGATE]

Arguments

  • OR REPLACE: Replaces any function with the same name if it exists.OR REPLACE can't appear withIF NOT EXISTS.
  • IF NOT EXISTS: If any dataset exists with the same name, theCREATE statement has no effect.IF NOT EXISTS can't appear withOR REPLACE.
  • TEMP orTEMPORARY: The function is temporary; that is, it exists for the lifetime of a single query, script, session, or procedure. A temporary function can't have the same name as a built-in function. If the names match, an error is produced. IfTEMP orTEMPORARY is not included, a persistent function is created. You can reuse persistent functions across multiple queries.
  • function_path: The path where the function must be created and the name of the function.
    • project_name: For persistent functions, the name of the project where you are creating the function. Defaults to the project that runs the DDL query. Don't include the project name for temporary functions.
    • dataset_name: For persistent functions, the name of the dataset where you are creating the function. Defaults todefaultDataset in the request. Don't include the dataset name for temporary functions.
    • function_name: The name of the function. Function names must contain only letters, numbers, and underscores, and be at most 256 characters long.
  • function_parameter: A parameter for the function.
    • parameter_name: The name of the function parameter.
    • parameter_data_type: The GoogleSQLdata type for the function parameter.
    • NOT AGGREGATE: The function parameter is not an aggregate. Only one non-aggregate function parameter is allowed per JavaScript UDAF, and it must be the last parameter in the list.
  • return_data_type: The GoogleSQL data type that the function should return.
  • function_body: The JavaScript expression that defines the function body. For more information, seefunction_body.
  • function_option_list: A list of options for creating the function. For more information, seefunction_option_list.

function_body

The body of the JavaScript function must be a quoted string literalthat represents the JavaScript code. To learn more about the different types ofquoted string literals you can use, seeFormats for quoted literals.

Only certain type encodings are allowed. To learn more,seeSQL type encodings in a JavaScript UDAF.

The JavaScript function body must include four JavaScript functionsthat initialize, aggregate, merge, and finalize the results for theJavaScript UDAF. To learn more about theinitialState,aggregate,merge,andfinalize JavaScript functions, seeRequired aggregate functions in a JavaScript UDAF.

Only serialized data can be passed into the JavaScript aggregate functions.If you need to serialize data such as functions or symbols to pass them intothe aggregate functions, use the JavaScript serialization functions.For more information, seeSerialization functions for a JavaScript UDAF.

function_option_list

The option list specifies options for creating a JavaScript UDAF. The followingoptions are supported:

NAMEVALUEDetails
description

STRING

A description of the UDAF.
library

ARRAY<STRING>

An array of JavaScript libraries to include in the JavaScript UDAF function body.

Example:["gs://my-bucket/lib1.js", "gs://my-bucket/lib2.js"]

SQL type encodings in a JavaScript UDAF

In JavaScript UDAFs,GoogleSQL data typesrepresentJavaScript data typesin the following manner:

GoogleSQL
data type
JavaScript
data type
Notes
ARRAYArray An array of arrays is not supported. To get around this limitation, use theArray<Object<Array>> (JavaScript) andARRAY<STRUCT<ARRAY>> (GoogleSQL) data types.
BIGNUMERICNumber orString Same asNUMERIC.
BOOLBoolean
BYTESUint8Array
DATEDate
FLOAT64Number
INT64BigInt
JSONVarious types The GoogleSQLJSON data type can be converted into a JavaScriptObject,Array, or other GoogleSQL-supported JavaScript data type.
NUMERICNumber orString If aNUMERIC value can be represented exactly as anIEEE 754 floating-point value (range[-253, 253]), and has no fractional part, it is encoded as aNumber data type, otherwise it is encoded as aString data type.
STRINGString
STRUCTObject EachSTRUCT field is a named property in theObject data type. An unnamedSTRUCT field is not supported.
TIMESTAMPDateDate contains a microsecond field with the microsecond fraction ofTIMESTAMP.
Note: The SQL encodings for JavaScript UDAFs are different from those forJavaScript UDFs.

Required aggregation functions in a JavaScript UDAF

The JavaScript function body must include the following exportableJavaScript functions:

  • initialState function: Sets up the initial aggregation state of the UDAFand then returns the initial aggregation state.

    Syntax:

    exportfunctioninitialState([nonAggregateParam]){...}

    Parameters:

    • nonAggregateParam: Replace this parameter with aNOT AGGREGATE function parameter name.

    Examples:

    exportfunctioninitialState(){...}
    exportfunctioninitialState(initialSum){...}
  • aggregate function: Aggregates one row of data, updating state to storethe result of the aggregation. Doesn't return a value.

    Syntax:

    exportfunctionaggregate(state,aggregateParam[,...][,nonAggregateParam]){...}

    Parameters:

    • state: The aggregate state, which isinitialState on thefirst invocation, and then the return value of the previous call toaggregate thereafter.

    • aggregateParam: The name of an aggregation parameter in theJavaScript UDAF. The argument for this parameter will be aggregated.

    • nonAggregateParam: Replace with aNOT AGGREGATEfunction parameter name.

    Example:

    exportfunctionaggregate(currentState,aggX,aggWeight,initialSum)
  • merge function: Combines two aggregation states from a prior callto theaggregate,merge, orinitialState function. This function doesnot return a value.

    Syntax:

    exportfunctionmerge(state,partialState[,nonAggregateParam]){...}

    Parameters:

    • state: The state into whichpartialState is merged.

    • partialState: The second aggregation state to merge.

    • nonAggregateParam: Replace with aNOT AGGREGATEfunction parameter name.

    Details:

    Depending on the size and organization of the underlying data being queried,themerge function might or might not be called. For example, if aparticular set of data is small, or the data is partitioned in a way thatresults in small sets of data, themerge function won't be called.

    Example:

    exportfunctionmerge(currentState,partialState,initialSum)
  • finalize function: Computes the final aggregation result and then returnsthis result for the UDAF.

    Syntax:

    exportfunctionfinalize(state[,nonAggregateParam]){...}

    Parameters:

    • state: The final aggregation state.

    • nonAggregateParam: Replace with aNOT AGGREGATEfunction parameter name.

    The final aggregation state is returned by themerge function(oraggregate function ifmerge is never invoked). If the input is empty afterNULL filtering,the final aggregation state isinitialState.

    Example:

    exportfunctionfinalize(finalState,initialSum)

Serialization functions for a JavaScript UDAF

If you want to work with non-serializable aggregation states, theJavaScript UDAF must provide theserialize anddeserialize functions:

  • serialize function: Converts an aggregation state into aBigQuery-serializable object. An object in JavaScriptis BigQuery-serializable if all fieldsare a JavaScript primitive data type (for example,String,Number,null,undefined), anotherBigQuery-serializable object, or a JavaScriptArray, where all elements are either primitives orBigQuery-serializable objects.

    Syntax:

    exportfunctionserialize(state[,nonAggregateParam]){...}

    Arguments:

    • state: The aggregation state to serialize.

    • nonAggregateParam: Replace with aNOT AGGREGATEfunction parameter name.

    Example:

    exportfunctionserialize(stateToSerialize,initialSum)
  • deserialize function: Converts a serialized state into an aggregationstate. An aggregated state can be passed into theserialize,aggregate,merge, andfinalize functions.

    Syntax:

    exportfunctiondeserialize(serializedState[,nonAggregateParam]){...}

    Arguments:

    • serializedState: The serialized state to convert into the aggregationstate.

    • nonAggregateParam: Replace with aNOT AGGREGATEfunction parameter name.

    Example:

    exportfunctiondeserialize(stateToDeserialize,initialSum)

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.routines.createThe dataset where you create the function.

In addition, theOR REPLACE clause requires thebigquery.routines.updatepermission.

Examples

Calculate the positive sum of all rows

A JavaScript UDAF is similar to a JavaScript UDF, but defines anaggregate function instead of a scalar function. In the following example,a temporary JavaScript UDAF calculates the sum of all rows that have apositive value. The JavaScript UDAF body is quoted within a raw string:

CREATETEMPAGGREGATEFUNCTIONSumPositive(xFLOAT64)RETURNSFLOAT64LANGUAGEjsASr'''  export function initialState() {    return {sum: 0}  }  export function aggregate(state, x) {    if (x > 0) {      state.sum += x;    }  }  export function merge(state, partialState) {    state.sum += partialState.sum;  }  export function finalize(state) {    return state.sum;  }''';-- Call the JavaScript UDAF.WITHnumbersAS(SELECT*FROMUNNEST([1.0,-1.0,3.0,-3.0,5.0,-5.0])ASx)SELECTSumPositive(x)ASsumFROMnumbers;/*-----* | sum | +-----+ | 9.0 | *-----*/

Get the weighted average of all rows

A JavaScript UDAF can have aggregate and non-aggregate parameters.In the following example, the JavaScript UDAF calculates the weighted averageforx after starting with an initial sum (initialSum).x andweight areaggregate parameters, andinitialSum is a non-aggregate parameter:

CREATEORREPLACEAGGREGATEFUNCTIONmy_project.my_dataset.WeightedAverage(xINT64,weightFLOAT64,initialSumFLOAT64NOTAGGREGATE)RETURNSINT64LANGUAGEjsAS'''   export function initialState(initialSum) {     return {count: 0, sum: initialSum}   }   export function aggregate(state, x, weight) {     state.count += 1;     state.sum += Number(x) * weight;   }   export function merge(state, partialState) {     state.sum += partialState.sum;     state.count += partialState.count;   }   export function finalize(state) {     return state.sum / state.count;   }''';SELECTmy_project.my_dataset.WeightedAverage(item,weight,2)ASweighted_averageFROM(SELECT1ASitem,2.45ASweightUNIONALLSELECT3ASitem,0.11ASweightUNIONALLSELECT5ASitem,7.02ASweight);/*------------------* | weighted_average | +------------------+ | 13               | *------------------*/

CREATE TABLE FUNCTION statement

Creates a newtable function,also called atable-valued function (TVF).

Syntax

CREATE[ORREPLACE]TABLEFUNCTION[IFNOTEXISTS][[project_name.]dataset_name.]function_name([function_parameter[,...]])[RETURNSTABLE <column_declaration[,...] >][OPTIONS(table_function_options_list)]ASsql_queryfunction_parameter:parameter_name{data_type|ANYTYPE|TABLE <column_declaration[,...] >}column_declaration:column_namedata_type

Arguments

  • OR REPLACE: Replaces any table function with the same name if it exists.Cannot appear withIF NOT EXISTS.
  • IF NOT EXISTS: If any table function exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.
  • project_name: The name of the project where you are creating the function.Defaults to the project that runs this DDL statement.
  • dataset_name: The name of the dataset where you are creating the function.
  • function_name: The name of the function to create.
  • function_parameter: A parameter for the function, specified as a parametername and a data type. The value ofdata_type is a scalarBigQuerydata typeorANY TYPE, or a table schema. Usingtable parameters in atable function is inPreview.
  • RETURNS TABLE: The schema of the table that the function returns, specifiedas a comma-separated list of column name and data type pairs. IfRETURNSTABLE is absent, BigQuery infers the output schema from thequery statement in the function body. IfRETURNS TABLE is included, thenames in the returned table type must match column names from the SQL query.
  • sql_query: Specifies the SQL query to run. The SQL query must include namesfor all columns.

table_function_options_list

Thetable_function_options_list lets you specify table function options. Table functionoptions have the same syntax and requirements as table options but with adifferent list ofNAMEs andVALUEs:

NAMEVALUEDetails
description

STRING

The description of the table function.

Details

BigQuery coerces argument types when possible. For example, ifthe parameter type isFLOAT64 and you pass anINT64 value, thenBigQuery coerces it to aFLOAT64.

If a parameter type isANY TYPE, the function accepts an input of any type forthis argument. The type that you pass to the function must be compatible withthe function definition. If you pass an argument with an incompatible type, thequery returns an error. If more than one parameter has typeANY TYPE,BigQuery does not enforce any type relationship between them.If the parameter type is a table schema, the function acceptsan input table containing a superset of those columns in any order.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.routines.createThe dataset where you create the tablefunction.

In addition, theOR REPLACE clause requiresbigquery.routines.updatepermission.

Examples

The following table function takes anINT64 parameter that is used to filterthe results of a query:

CREATEORREPLACETABLEFUNCTIONmydataset.names_by_year(yINT64)ASSELECTyear,name,SUM(number)AStotalFROM`bigquery-public-data.usa_names.usa_1910_current`WHEREyear=yGROUPBYyear,name

The following example specifies the returnTABLE type in theRETURNS clause:

CREATEORREPLACETABLEFUNCTIONmydataset.names_by_year(yINT64)RETURNSTABLE<nameSTRING,yearINT64,totalINT64>ASSELECTyear,name,SUM(number)AStotalFROM`bigquery-public-data.usa_names.usa_1910_current`WHEREyear=yGROUPBYyear,name

The following example computes total sales for items with the nameitem_name from theorders table:

CREATETABLEFUNCTIONmydataset.compute_sales(ordersTABLE<itemSTRING,salesINT64>,item_nameSTRING)AS(SELECTSUM(sales)AStotal_sales,itemFROMordersWHEREitem=item_nameGROUPBYitem);

CREATE PROCEDURE statement

Creates a newprocedure,which is a block of statements that can be called from other queries.Procedures can call themselves recursively.

Syntax

To create aGoogleSQL stored procedure,use the following syntax:

CREATE[ORREPLACE]PROCEDURE[IFNOTEXISTS][[project_name.]dataset_name.]procedure_name(procedure_argument[,...])[OPTIONS(procedure_option_list)]BEGINmulti_statement_queryEND;procedure_argument:[procedure_argument_mode]argument_nameargument_type
procedure_argument_mode:IN|OUT|INOUT

To create astored procedure for Apache Spark,use the following syntax:

CREATE[ORREPLACE]PROCEDURE[IFNOTEXISTS][[project_name.]dataset_name.]procedure_name(procedure_argument[,...])[EXTERNALSECURITYexternal_security]WITHCONNECTIONconnection_project_id.connection_region.connection_id[OPTIONS(procedure_option_list)]LANGUAGElanguage[ASpyspark_code]procedure_argument:[procedure_argument_mode]argument_nameargument_type
procedure_argument_mode:IN|OUT|INOUTexternal_security:INVOKER

Arguments

  • OR REPLACE: Replaces any procedure with the same name if it exists. Cannotappear withIF NOT EXISTS.

  • IF NOT EXISTS: If any procedure exists with the same name, theCREATEstatement has no effect. Cannot appear withOR REPLACE.

  • project_name: The name of the project where you are creating theprocedure. Defaults to the project that runs this DDL query. If the projectname contains special characters such as colons, it should be quoted in backticks` (example:`google.com:my_project`).

  • dataset_name: The name of the dataset where you are creating the procedure.Defaults to thedefaultDataset in the request.

  • procedure_name: The name of the procedure to create.

  • external_security: The procedure to be executed with the privileges ofthe user that calls it.

  • connection_project_id: the project thatcontains theconnection to runSpark procedures—for example,myproject.

  • connection_region: the region thatcontains the connection to run Sparkprocedures—for example,us.

  • connection_id: the connection ID—for example,myconnection.

    When youview the connection detailsin the Google Cloud console, the connection ID is the value in the lastsection of the fully qualified connection ID that is shown inConnection ID—for exampleprojects/myproject/locations/connection_location/connections/myconnection.

    For more information, seeCreate a stored procedure for Apache Spark.

  • multi_statement_query: Themulti-statement queryto run.

  • language: The language in which the stored procedure for Apache Spark iswritten. BigQuery supports stored procedures for ApacheSpark that are written in Python, Java, or Scala.

  • pyspark_code: The PySpark code for the stored procedure for Apache Spark ifyou want to pass the body of the procedure inline. Cannot appear withmain_file_uri inprocedure_option_list.

  • argument_type: Any valid BigQuerytype.

  • procedure_argument_mode: Specifies whether an argument is an input, anoutput, or both.

procedure_option_list

Theprocedure_option_list lets you specify procedure options. Procedureoptions have the same syntax and requirements as table options but with adifferent list ofNAMEs andVALUEs:

NAMEVALUEDetails
strict_mode

BOOL

It is useful for catching many common types of errors. The errors are not exhaustive, and successful creation of a procedure withstrict_mode doesn't guarantee that the procedure will successfully execute at runtime.

Ifstrict_mode isTRUE, the procedure body undergoes additional checks for errors such as non-existent tables or columns. TheCREATE PROCEDURE statement fails if the body fails any of these checks.

Ifstrict_mode isFALSE, the procedure body is checked only for syntax. Procedures which invoke themselves recursively should be created withstrict_mode=FALSE to avoid errors caused by the procedure not yet existing while it is being validated.

Default value isTRUE.

Example:strict_mode=FALSE
description

STRING

A description of the procedure.
Example:description="A procedure that runs a query."
engineSTRING

The engine type for processing stored procedures for Apache Spark. Must be specified for stored procedures for Spark.

Valid value:engine="SPARK"
runtime_versionSTRING

The runtime version of stored procedures for Spark.

If not specified, the system default runtime version is used. Stored procedures for Spark support the same list of runtime versions as Serverless for Apache Spark. However, we recommend to specify a runtime version. For more information, seeServerless for Apache Spark runtime releases.

Example:runtime_version="1.1"
container_imageSTRING

Custom container image for the runtime environment of the stored procedure for Spark.

If not specified, the system default container image that includes the default Spark, Java, and Python packages associated with a runtime version is used.

You can provide a custom container Docker image that includes your own built Java or Python dependencies. As Spark is mounted into your custom container at runtime, you must omit Spark in your custom container image.

For optimized performance, we recommend you to host your image in Artifact Registry. For more information, seeUse custom containers with Serverless for Apache Spark.

Example:container_image="us-docker.pkg.dev/my-project-id/my-images/my-image”

propertiesARRAY<STRUCT<STRING, STRING>>

A key-value pair to include properties for stored procedures for Spark.

Stored procedures for Spark support most of theSpark properties and a list of Serverless for Apache Spark properties. If you specify unsupported Spark properties such as YARN-related Spark properties, BigQuery fails to create the stored procedure. You can add Spark properties using the following format:[("key1","value1"),("key2", "value2")]

For example:
bqquery--nouse_legacy_sql--dry_run'CREATE PROCEDURE my_bq_project.my_dataset.spark_proc()WITH CONNECTION `my-project-id.us.my-connection`OPTIONS(engine="SPARK",main_file_uri="gs://my-bucket/my-pyspark-main.py",properties=[("spark.executor.instances", "3"),("spark.yarn.am.memory", "3g")])LANGUAGE PYTHON'# Error in query string: Invalid value: \Invalidproperties:\Attemptedtosetunsupportedproperties:\[spark:spark.yarn.am.memory]at[1:1]
Note: You can use theBigQuery dry run feature to validate your stored procedure without creating it.
main_file_uriSTRING

The Cloud Storage URI of the main Python, Scala, or Java JAR file of the Spark application. Applies only to stored procedures for Spark.

Alternatively, if you want to add the body of the stored procedure that's written in Python in theCREATE PROCEDURE statement, add the code afterLANGUAGE PYTHON AS as shown in the example inUse inline code.

Example:main_file_uri="gs://my-bucket/my-pyspark-main.py"

For Scala and Java languages, this field contains a path to only one JAR file. You can set only one value formain_file_uri andmain_class.

Example:main_file_uri="gs://my-bucket/my-scala-main.jar"
main_classSTRING

Applies only to stored procedures for Spark written in Java and Scala. Specify a fully-qualified class name in a JAR set with thejar_uris option. You can set only one value formain_file_uri andmain_class.

Example:main_class=”com.example.wordcount”
py_file_urisARRAY<STRING>

Python files to be placed on thePYTHONPATH for a PySpark application. Applies only to stored procedures for Apache Spark written in Python.

Optional. Cloud Storage URIs of Python files to pass to the PySpark framework. Supported file formats include the following:.py,.egg, and.zip.

Example:py_file_uris=[ "gs://my-bucket/my-pyspark-file1.py", "gs://my-bucket/my-pyspark-file2.py" ]
jar_urisARRAY<STRING>

Path to the JAR files to include on the driver and executor classpaths. Applies only to stored procedures for Apache Spark.

Optional. Cloud Storage URIs of JAR files to add to the classpath of the Spark driver and tasks.

Example:
jar_uris=["gs://my-bucket/my-lib1.jar", "gs://my-bucket/my-lib2.jar"]
file_urisARRAY<STRING>

Files to be placed in the working directory of each executor. Applies only to stored procedures for Apache Spark.

Optional. Cloud Storage URIs of files to be placed in the working directory of each executor. Example:
file_uris=["gs://my-bucket/my-file1", "gs://my-bucket/my-file2"]
archive_urisARRAY<STRING>

Archive files to be extracted into the working directory of each executor. Applies only to stored procedures for Apache Spark.

Optional. Cloud Storage URIs of archives to be extracted into the working directory of each executor. Supported file formats include the following:.jar,.tar,.tar.gz,.tgz, and.zip.

Example:
archive_uris=["gs://my-bucket/my-archive1.zip", "gs://my-bucket/my-archive2.zip"]

Argument mode

IN indicates that the argument is only an input to the procedure. You canspecify either a variable or a value expression forIN arguments.

OUT indicates that the argument is an output of the procedure. AnOUTargument is initialized toNULL when the procedure starts. Youmust specify a variable forOUT arguments.

INOUT indicates that the argument is both an input to and an output fromthe procedure. You must specify a variable forINOUT arguments. AnINOUTargument can be referenced in the body of a procedure as a variable and assignednew values.

If neitherIN,OUT, norINOUT is specified, the argument is treated as anIN argument.

Variable scope

If a variable is declared outside a procedure, passed as an INOUT or OUT argument to a procedure, and the procedure assigns a new value to that variable, that new value is visible outside of the procedure.

Variables declared in a procedure are not visible outside of the procedure,and vice versa.

AnOUT orINOUT argument can be assigned a value usingSET, in which casethe modified value is visible outside of the procedure. If the procedure exitssuccessfully, then the value of theOUT orINOUT argument is the final valueassigned to thatINOUT variable.

Temporary tables exist for the duration of thescript, so if a procedure creates a temporary table, the caller of the procedurewill be able to reference the temporary table as well.

Default project in procedure body

Procedure bodies can reference entities without specifying the project; thedefault project is the project which owns the procedure, not necessarily theproject used to run theCREATE PROCEDURE statement. Consider the sample querybelow.

CREATEPROCEDUREmyProject.myDataset.QueryTable()BEGINSELECT*FROManotherDataset.myTable;END;

After creating the above procedure, you can run the queryCALL myProject.myDataset.QueryTable(). Regardless of the project youchoose to run thisCALL query, the referenced tableanotherDataset.myTableis always resolved against projectmyProject.

Required permissions

This statement requires the followingIAM permission:

PermissionResource
bigquery.routines.createThe dataset where you create the procedure.

To create a stored procedure for Apache Spark, additionalIAM permission are needed:

PermissionResource
bigquery.connections.delegateThe connection which you use tocreate the stored procedure for Apache Spark.

In addition, theOR REPLACE clause requiresbigquery.routines.updatepermission.

SQL examples

You can also seeexamples of stored procedures forApache Spark.

The following example creates a SQL procedure that both takesx as an inputargument and returnsx as output; because no argument mode is present for theargumentdelta, it is an input argument. The procedure consists of a blockcontaining a single statement, which assigns the sum of the two input argumentstox.

CREATEPROCEDUREmydataset.AddDelta(INOUTxINT64,deltaINT64)BEGINSETx=x+delta;END;

The following example calls theAddDelta procedure from the example above,passing it the variableaccumulator both times; because the changes toxwithinAddDelta are visible outside ofAddDelta, these procedure callsincrementaccumulator by a total of 8.

DECLAREaccumulatorINT64DEFAULT0;CALLmydataset.AddDelta(accumulator,5);CALLmydataset.AddDelta(accumulator,3);SELECTaccumulator;

This returns the following:

+-------------+|accumulator|+-------------+|8|+-------------+

The following example creates the procedureSelectFromTablesAndAppend, whichtakestarget_date as an input argument and returnsrows_added as an output.The procedure creates a temporary tableDataForTargetDate from a query; then,it calculates the number of rows inDataForTargetDate and assigns the resulttorows_added. Next, it inserts a new row intoTargetTable, passing thevalue oftarget_date as one of the column names. Finally, it drops the tableDataForTargetDate and returnsrows_added.

CREATEPROCEDUREmydataset.SelectFromTablesAndAppend(target_dateDATE,OUTrows_addedINT64)BEGINCREATETEMPTABLEDataForTargetDateASSELECTt1.id,t1.x,t2.yFROMdataset.partitioned_table1ASt1JOINdataset.partitioned_table2ASt2ONt1.id=t2.idWHEREt1.date=target_dateANDt2.date=target_date;SETrows_added=(SELECTCOUNT(*)FROMDataForTargetDate);SELECTid,x,y,target_date-- note that target_date is a parameterFROMDataForTargetDate;DROPTABLEDataForTargetDate;END;

The following example declares a variablerows_added, then passes it as anargument to theSelectFromTablesAndAppend procedure from the previous example,along with the value ofCURRENT_DATE; then it returns a message stating howmany rows were added.

DECLARErows_addedINT64;CALLmydataset.SelectFromTablesAndAppend(CURRENT_DATE(),rows_added);SELECTFORMAT('Added %d rows',rows_added);

CREATE ROW ACCESS POLICY statement

Creates or replaces arow-level access policy. Row-levelaccess policies on a table must have unique names.

Syntax

CREATE[ORREPLACE]ROWACCESSPOLICY[IFNOTEXISTS]row_access_policy_nameONtable_name[GRANTTO(grantee_list)]FILTERUSING(filter_expression);

Arguments

  • IF NOT EXISTS: If any row-level access policy exists with the same name, theCREATE statement has no effect. Cannot appear withOR REPLACE.

  • row_access_policy_name: The name of the row-level access policy that you arecreating. The row-level access policy name must be unique for each table. Therow-level access policy name can contain the following:

    • Up to 256 characters.
    • Letters (upper or lowercase), numbers, and underscores. Must start with aletter.
  • table_name: The name of the table that you want to create a row-level accesspolicy for. The table must already exist.

  • GRANT TO grantee_list: An optional clause that specifies the initialmembers that the row-level access policy should be created with.

    Caution: If nograntee_list is provided, then the row-level access policyfor the specified filter is initialized with no principals. This configurationprevents all data reads by everyone.

    grantee_list is a list ofiam_member users or groups. Strings must bevalidIAM principals, ormembers, following the format of anIAM Policy Binding member,and must be quoted. The following types are supported:

    Example:user:alice@example.com

    grantee_list types
    user:{emailid}

    An email address that represents a specific Google account.

    serviceAccount:{emailid}

    An email address that represents a serviceaccount.

    Example:serviceAccount:my-other-app@appspot.gserviceaccount.com

    group:{emailid}

    An email address that represents a Google group.

    Example:group:admins@example.com

    domain:{domain}

    The Google Workspace domain (primary) that represents allthe users of that domain.

    Example:domain:example.com

    allAuthenticatedUsersA special identifier that represents all serviceaccounts and all users on the internet who have authenticated with aGoogle Account. This identifier includes accounts that aren't connected to aGoogle Workspace or Cloud Identity domain, such as personal Gmail accounts.Users who aren't authenticated, such as anonymous visitors, aren't included.
    allUsersA special identifier that represents anyone who is on the internet, including authenticated and unauthenticated users. Because BigQuery requires authentication before a user can access the service,allUsers includes only authenticated users.

    You can combine a series ofiam_member values, if they are comma-separatedand quoted separately. For example:"user:alice@example.com","group:admins@example.com","user:sales@example.com"

  • filter_expression: Defines the subset of table rows to show only to themembers of thegrantee_list. Thefilter_expression is similar to theWHERE clause in aSELECT query.

    The following are valid filter expressions:

    • GoogleSQL scalar functions.
    • SESSION_USER(), to restrict access only to rows that belong to the userrunning the query. If none of the row-level access policies are applicableto the querying user, then the user has no access to the data in the table.
    • TRUE. Grants the principals in thegrantee_list field access to all rowsof the table.

    The filter expression cannot contain the following:

    • References to other tables, such assubqueries.
    • SQL statements such asSELECT,CREATE, orUPDATE.
    • User-defined functions.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.rowAccessPolicies.createThe target table.
bigquery.rowAccessPolicies.setIamPolicyThe target table.
bigquery.tables.getDataThe target table.

CREATE CAPACITY statement

Purchasesslots by creating a new capacity commitment.

Caution: Before you purchase slots, understand the details of thecommitment plans andpricing.

Syntax

CREATECAPACITY`project_id.location_id.commitment_id`OPTIONS(capacity_commitment_option_list);

Arguments

  • project_id: The project ID of the administration project that willmaintain ownership of this commitment.
  • location_id: Thelocation ofthe commitment.
  • commitment_id: The ID of the commitment. The value must be unique to theproject and location. It must start and end with a lowercase letter or anumber and contain only lowercase letters, numbers and dashes.
  • capacity_commitment_option_list: The options you can set to describe the capacity commitment.

capacity_commitment_option_list

The option list specifies options for the capacity commitment. Specify the options in the following format:NAME=VALUE, ...

The following options are supported:

NAMETYPEDetails
planStringThe commitment plan to purchase. Supported values include:ANNUAL,THREE_YEAR, andTRIAL. For more information, seeCommitment plans.
renewal_planStringThe commitment renewal plan. Applies only whenplan isANNUAL,THREE_YEAR, orTRIAL. For more information, see Renewing commitments.
slot_countIntegerThe number of slots in the commitment.
editionStringThe edition associated with this reservation. For more information about editions, seeIntroduction to BigQuery editions.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.capacityCommitments.createThe administration project thatmaintains ownership of thecommitments.

Example

The following example creates a capacity commitment of 100 annual slots that arelocated in theregion-us region and managed by a projectadmin_project:

CREATECAPACITY`admin_project.region-us.my-commitment`OPTIONS(slot_count=100,plan='ANNUAL');

CREATE RESERVATION statement

Creates a reservation. For more information, seeIntroduction to Reservations.

Syntax

CREATERESERVATION`project_id.location_id.reservation_id`OPTIONS(reservation_option_list);

Arguments

  • project_id: The project ID of the administration project where thecapacity commitment was created.
  • location: Thelocation ofthe reservation.
  • reservation_id: The reservation ID.
  • reservation_option_list: The options you can set to describe the reservation.

reservation_option_list

The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...

The following options are supported:

NAMETYPEDetails
ignore_idle_slotsBOOLEANIf the value istrue, then the reservation uses only the slots that are provisioned to it. The default value isfalse. For more information, seeIdle slots.
slot_capacityINTEGERThe number of slots to allocate to the reservation. If this reservation was created with anedition, this is equivalent to the amount ofbaseline slots.
target_job_concurrencyINTEGERA soft upper bound on the number of jobs that can run concurrently in this reservation.
editionSTRINGThe edition associated with this reservation. For more information about editions, seeIntroduction to BigQuery editions.
autoscale_max_slotsINTEGERThe maximum number of slots that could be added to the reservation by autoscaling.
secondary_locationSTRINGThe secondary location to use in the case of disaster recovery.
max_slotsINTEGERThe maximum number of slots the reservation can consume. For more details about predictable reservations, seeReservation predictability.
scaling_modeSTRINGThe scaling mode of the reservation. This value must be configured together withmax_slots. Also, this value must be aligned withignore_idle_slots. For details, seeReservation predictability.
labels<ARRAY<STRUCT<STRING, STRING>>>An array of labels for the reservation, expressed as key-value pairs.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.reservations.createThe administration project that maintainsownership of the commitments.

Example

The following example creates a reservation of 100 slots in the projectadmin_project:

CREATERESERVATION`admin_project.region-us.prod`OPTIONS(slot_capacity=100);

CREATE ASSIGNMENT statement

Assigns a project, folder, or organization to a reservation.

Syntax

CREATEASSIGNMENT`project_id.location_id.reservation_id.assignment_id`OPTIONS(assignment_option_list)

Arguments

  • project_id: The project ID of the administration project where thereservation was created.
  • location: Thelocation ofthe reservation.
  • reservation_id: The reservation ID.
  • assignment_id: The ID of the assignment. The value must be unique to theproject and location. It must start and end with a lowercase letter or anumber and contain only lowercase letters, numbers and dashes.
  • assignment_option_list: The options you can set to describe assignment.

To remove a project from any reservations and use on-demand billing instead, setreservation_id tonone.

assignment_option_list

The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...

The following options are supported:

NAMETYPEDetails
assigneeStringThe ID of the project, folder, or organization to assign to the reservation.
job_typeStringThe type of job to assign to this reservation. Supported values includeQUERY,PIPELINE,ML_EXTERNAL,CONTINUOUS, andBACKGROUND. For more information, seeReservation assignments.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.reservationAssignments.createThe administration project andthe assignee.

Example

The following example assigns the projectmy_project to theprod reservationfor query jobs:

CREATEASSIGNMENT`admin_project.region-us.prod.my_assignment`OPTIONS(assignee='projects/my_project',job_type='QUERY');

The following example assigns an organization to theprod reservation forpipeline jobs, such as load and extract jobs:

CREATEASSIGNMENT`admin_project.region-us.prod.my_assignment`OPTIONS(assignee='organizations/1234',job_type='PIPELINE');

CREATE SEARCH INDEX statement

Creates a newsearch index on one or morecolumns of a table.

A search index enables efficient queries using theSEARCHfunction.

Syntax

CREATESEARCHINDEX[IFNOTEXISTS]index_nameONtable_name({ALLCOLUMNS[WITHCOLUMNOPTIONS(column[,...])]|column[,...]})[OPTIONS(index_option_list)]column:=column_name[OPTIONS(index_column_option_list)]

Arguments

  • IF NOT EXISTS: If there is already a search index by that name on the table,do nothing. If the table has a search index by a different name, then returnan error.

  • index_name: The name of the search index you're creating. Since the searchindex is always created in the same project and dataset as the base table,there is no need to specify these in the name.

  • table_name: The name of the table. SeeTable path syntax.

  • ALL COLUMNS: If data types are not specified, creates a search index onevery column in the table which contains aSTRING field. If data types arespecified, create a search index on every column in the table which matchesany of the data types specified.

  • WITH COLUMN OPTIONS: Can only be used withALL COLUMNS to setoptions on specific indexed columns.

  • column_name: The name of a top-level column in the table which is one ofthe following supported data types or contains a field with one of thesupported data types:

    Supported data typesNotes
    STRINGPrimitive data type.
    INT64Primitive data type.
    TIMESTAMPPrimitive data type.
    ARRAY<PRIMITIVE_DATA_TYPE>Must contain a primitive data type in this list.
    STRUCT orARRAY<STRUCT>Must contain at least one nested field thatis a primitive data type in this list orARRAY<PRIMITIVE_DATA_TYPE>.
    JSONMust contain at least one nested field of a typethat matches any data types in this list.
  • index_column_option_list: The list of optionsto set on indexed columns.

  • index_option_list: The list of options to set on thesearch index.

Details

You can create only one search index per base table. You cannot create a searchindex on a view or materialized view. To modify which columns areindexed,DROP the current index and create a new one.

BigQuery returns an error if anycolumn_name is not aSTRINGor does not contain aSTRING field, or if you callCREATE SEARCH INDEX onALL COLUMNS of a table which contains noSTRING fields.

Creating a search index fails on a table which has column ACLs or row filters;however, these may all be added to the table after creation of the index.

index_option_list

The option list specifies options for the search index. Specify the options inthe following format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
analyzerSTRING

Example:analyzer='LOG_ANALYZER'

Thetext analyzer to use to generate tokens for the search index. The supported values are'LOG_ANALYZER','NO_OP_ANALYZER', and'PATTERN_ANALYZER'.

analyzer_optionsJSON-formatted STRINGThe text analyzer configurations to set when creating a search index. Supported whenanalyzer is equal to'LOG_ANALYZER' or'PATTERN_ANALYZER'. For examples of JSON-formatted strings with different text analyzers, seeWork with text analyzers.
data_typesARRAY<STRING>

Example:data_types=['STRING', 'INT64', 'TIMESTAMP']

An array of data types to set when creating a search index. Supported data types areSTRING,INT64 andTIMESTAMP. Ifdata_types is not set,STRING fields are indexed by default.

default_index_column_granularitySTRING

InPreview.

Example:default_index_column_granularity='GLOBAL'

The default granularity of information to store for each indexed column. The supported values are'GLOBAL' (default) and'COLUMN'. For more information, seeIndex with column granularity.

index_column_option_list

NAMEVALUEDetails
index_granularitySTRING

InPreview.

Example:index_granularity='GLOBAL'

The granularity of information to store for the indexed column. This setting overrides the default granularity specified in thedefault_index_column_granularity field of the index options. The supported values are'GLOBAL' (default) and'COLUMN'. For more information, seeIndex with column granularity.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createIndexThe base table where you create the index.

Examples

The following example creates a search index calledmy_index on all stringcolumns ofmy_table. In this case, the index is only created on columna.

CREATETABLEdataset.my_table(aSTRING,bINT64);CREATESEARCHINDEXmy_indexONdataset.my_table(ALLCOLUMNS);

The following example creates a search index on columnsa,my_struct.string_field, andb that uses theNO_OP_ANALYZER text analyzer.It sets the default index column granularity toCOLUMN and overrides thesetting for columna toGLOBAL.

CREATETABLEdataset.complex_table(aSTRING,my_structSTRUCT<string_fieldSTRING,int_fieldINT64>,bARRAY<STRING>);CREATESEARCHINDEXmy_indexONdataset.complex_table(aOPTIONS(index_granularity='GLOBAL'),my_struct,b)OPTIONS(analyzer='NO_OP_ANALYZER',default_index_column_granularity='COLUMN');

CREATE VECTOR INDEX statement

Creates a newvector index on a column of atable.

A vector index lets you perform avector searchmore quickly, with the trade-off of reducingrecalland so returning more approximate results.

Syntax

CREATE[ORREPLACE]VECTORINDEX[IFNOTEXISTS]index_nameONtable_name(column_name)[STORING(stored_column_name[,...])][PARTITIONBYpartition_expression]OPTIONS(index_option_list);

Arguments

  • OR REPLACE: Replaces any vector index with the same name if it exists.Can't appear withIF NOT EXISTS.

  • IF NOT EXISTS: If there is already a vector index by that name on the table,do nothing. If the table has a vector index by a different name, then returnan error.

  • index_name: The name of the vector index you're creating. Since the indexis always created in the same project and dataset as the base table, there isno need to specify these in the name.

  • table_name: The name of the table. SeeTable path syntax.

  • column_name: The name of a column with a type ofARRAY<FLOAT64>,or if you're usingautonomous embedding generation (Preview),aSTRUCT<result ARRAY<FLOAT64>, status STRING> column.If column type isARRAY<FLOAT64>, thenall elements in the array must be non-NULL, and all values in the columnmust have the same array dimensions.

  • stored_column_name: The name ofa top-level column in the table to store inthe vector index. The column type can't beRANGE.Stored columns are not used if the table has a row-level access policy or thecolumn has a policy tag. To learn more, seeStore columns and pre-filter.

  • partition_expression: An expression that determineshow to partition the vector index. You can only partition TreeAH indexes.(Preview)

  • index_option_list: The list of options to seton the vector index.

Details

You can only create vector indexes onstandard tables.

You can create only one vector index per table. You can't create a vector indexon a table that already has asearch indexwith the same index name.

To modify which column is indexed,DROP the currentindex and create a new one.

index_option_list

The option list specifies options for the vector index. Specify the options inthe following format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
index_typeSTRINGRequired. The algorithm to use to build the vector index. The supported values areIVF andTREE_AH.

IVF: SpecifyingIVF builds the vector index as an inverted file index (IVF). An IVF uses a k-means algorithm to cluster the vector data, and then partitions the vector data based on those clusters. When you use theVECTOR_SEARCH function to search the vector data, it can use these partitions to reduce the amount of data it needs to read in order to determine a result.

TREE_AH: Uses Google'sScaNN algorithm.TREE_AH is a tree-quantization based index, leveraging k-means clustering for partitioning and asymmetric hashing (product quantization) for fast approximate distance computation. For more information, seeTreeAH index.

distance_typeSTRINGSpecifies the default distance type to use when performing a vector search using this index. The supported values areEUCLIDEAN,COSINE, andDOT_PRODUCT.EUCLIDEAN is the default.

The index creation itself always usesEUCLIDEAN distance for training but the distance used in theVECTOR_SEARCH function can be different.

If you specify a value for thedistance_type argument of theVECTOR_SEARCH function, that value is used instead of the vector index'sdistance_type value.

ivf_optionsJSON-formatted STRINGThe options to use with theIVF algorithm. Defaults to'{}' to denote that all underlying options use their corresponding default values.

The only supported option isnum_lists. Specify anINT64 less than or equal to 5,000 that determines how many lists the IVF algorithm creates. For example,ivf_options = '{"num_lists":1000}'.

During indexing, vectors are assigned to the list corresponding to their nearest cluster centroid. If you omit this argument, BigQuery determines a default value based on your data characteristics. The default value works well for most use cases.

num_lists controls query tuning granularity. Higher values create more lists, so you can set thefraction_lists_to_search option of theVECTOR_SEARCH function to scan a smaller percentage of the index. For example, scanning 1% of 100 lists as opposed to scanning 10% of 10 lists. This enables finer control of the search speed and recall but slightly increases the indexing cost. Set this argument value based on how precisely you need to tune query scope.

The statement fails ifivf_options is specified andindex_type is notIVF.

tree_ah_optionsJSON-formatted STRINGThe options to use with theTREE_AH algorithm. Defaults to'{}' to denote that all underlying options use their corresponding default values.

Two options are supported:leaf_node_embedding_node andnormalization_type.

leaf_node_embedding_count is anINT64 value greater than or equal to 500 that specifies the approximate number of vectors in each leaf node of the tree that the TreeAH algorithm creates. The TreeAH algorithm divides the whole data space into a number of lists, with each list containing approximatelyleaf_node_embedding_count data points. A lower value creates more lists with fewer data points, while a larger value creates fewer lists with more data points. The default is 1,000, which is appropriate for most datasets.

normalization_type: the type of normalization performed on each base table and query vector prior to any processing. The supported values areNONE andL2.L2 is also referred to as theEuclidean norm. Defaults toNONE. Normalization happens before any processing, for both the base table data and the query data, but doesn't modify the embedding column in the table. Depending on the dataset, the embedding model, and the distance type used duringVECTOR_SEARCH, normalizing the embeddings might improve recall.

For exampletree_ah_options = '{"leaf_node_embedding_count": 1000, "normalization_type": "L2"}'

The statement fails iftree_ah_options is specified andindex_type is notTREE_AH.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.createIndexThe table where you create the vector index.

If you choose to use theOR REPLACE clause, you must also have thebigquery.tables.updateIndex permission.

Examples

The following example creates a vector index on theembedding columnofmy_table:

CREATETABLEmy_dataset.my_table(idINT64,embeddingARRAY<FLOAT64>);CREATEVECTORINDEXmy_indexONmy_dataset.my_table(embedding)OPTIONS(index_type='IVF');

The following example creates a vector index on theembedding columnofmy_table, and specifies the distance type to use and the IVF options:

CREATETABLEmy_dataset.my_table(idINT64,embeddingARRAY<FLOAT64>);CREATEVECTORINDEXmy_indexONmy_dataset.my_table(embedding)OPTIONS(index_type='IVF',distance_type='COSINE',ivf_options='{"num_lists":2500}');

The following example creates a vector index on theembedding columnofmy_table, and specifies the distance type to use and the TREE_AH options:

CREATETABLEmy_dataset.my_table(idINT64,embeddingARRAY<FLOAT64>);CREATEVECTORINDEXmy_indexONmy_dataset.my_table(embedding)OPTIONS(index_type='TREE_AH',distance_type='EUCLIDEAN',tree_ah_options='{"normalization_type": "L2"}');

CREATE DATA_POLICY statement

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To provide feedback or request support for this feature, send an email tobigquery-security@google.com.

Creates or replaces adata policy.The name of the data policy must be unique within the project.

Syntax

CREATE[ORREPLACE]DATA_POLICY[IFNOTEXISTS]`project_id.region-location_id.data_policy_id`OPTIONS(index_option_list);

Arguments

  • OR REPLACE: Replaces any data policy with the same name if it exists.Can't appear withIF NOT EXISTS.

  • IF NOT EXISTS: If there is already a data policy by that name in theproject, theCREATE statement has no effect.

  • project_id: The project ID of the project where the data policy will residein.

  • location_id: The location of the data policy.

  • data_policy_id: The name of the data policy that is unique within theproject that the data policy resides in.

  • index_option_list: The list of options to set onthe data policy.

index_option_list

The option list specifies options for the data policy. Specify the options inthe following format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
data_policy_typeSTRINGThe supported values areDATA_MASKING_POLICY andRAW_DATA_ACCESS_POLICY.

If not specified, the default value isRAW_DATA_ACCESS_POLICY.

You can't update this field once the data policy has been created.

DATA_MASKING_POLICY type should come withmasking_expression set.

masking_expressionSTRINGSpecifies thepredefined masking rule or acustom masking routine.

Required permissions

The user or service account that creates a data policy must have thebigquery.dataPolicies.create permission.

Thebigquery.dataPolicies.create permission is included in theBigQuery Data Policy Admin, BigQuery Admin and BigQuery Data Owner roles.

If you are creating a data policy that references a custom masking routine,you also needroutine permissions.

ALTER SCHEMA SET DEFAULT COLLATE statement

Setscollation specifications ona dataset.

Syntax

ALTERSCHEMA[IFEXISTS][project_name.]dataset_nameSETDEFAULTCOLLATEcollate_specification

Arguments

  • IF EXISTS: If no dataset exists with that name, the statement has no effect.

  • DEFAULT COLLATE collate_specification: When a new table is created in thedataset, the table inherits adefaultcollation specificationunless a collation specification is explicitly specified for acolumn.

    The updated collation specification only applies to tables created afterwards.

  • project_name: The name of the project that contains the dataset. Defaultsto the project that runs this DDL statement.

  • dataset_name: The name of the dataset.

  • collate_specification:Specifies the collation specifications to set.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.datasets.getThe dataset to alter.
bigquery.datasets.updateThe dataset to alter.

Example

Assume you have an existing table,mytable_a, in a dataset calledmydataset.For example:

CREATESCHEMAmydataset
CREATETABLEmydataset.mytable_a(numberINT64,wordSTRING)
+----------------------+| mydataset.mytable_a  ||   number INT64       ||   word STRING        |+----------------------+

At a later time, you decide to add a collation specification to yourdataset. For example:

ALTERSCHEMAmydatasetSETDEFAULTCOLLATE'und:ci'

If you create a new table for your dataset, it inheritsCOLLATE 'und:ci' forallSTRING columns. For example, collation is added tocharacterswhen you create themytable_b table in themydataset dataset:

CREATETABLEmydataset.mytable_b(amountINT64,charactersSTRING)
+--------------------------------------+| mydataset.mytable_b                  ||   amount INT64                       ||   characters STRING COLLATE 'und:ci' |+--------------------------------------+

However, although you have updated the collation specification for the dataset,your existing table,mytable_a, continues to use the previouscollation specification. For example:

+---------------------+| mydataset.mytable_a ||   number INT64      ||   word STRING       |+---------------------+

ALTER SCHEMA SET OPTIONS statement

Sets options on a dataset.

The statement runs in the location of the dataset if the dataset exists, unlessyou specify the location in the query settings. For more information, seeSpecifying your location.

Syntax

ALTERSCHEMA[IFEXISTS][project_name.]dataset_nameSETOPTIONS(schema_set_options_list)

Arguments

  • IF EXISTS: If no dataset exists with that name, the statement has no effect.

  • project_name: The name of the project that contains the dataset. Defaultsto the project that runs this DDL statement.

  • dataset_name: The name of the dataset.

  • schema_set_options_list: The list of options toset.

schema_set_options_list

The option list specifies options for the dataset. Specify the options in thefollowing format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
default_kms_key_nameSTRINGSpecifies the default Cloud KMS key for encrypting table data in this dataset. You can override this value when you create a table.
default_partition_expiration_daysFLOAT64Specifies the default expiration time, in days, for table partitions in this dataset. You can override this value when you create a table.
default_rounding_mode

STRING

Example:default_rounding_mode = "ROUND_HALF_EVEN"

This specifies thedefaultRoundingMode that is used for new tables created in this dataset. It does not impact existing tables. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.
default_table_expiration_daysFLOAT64Specifies the default expiration time, in days, for tables in this dataset. You can override this value when you create a table.
descriptionSTRINGThe description of the dataset.
failover_reservationSTRINGAssociates the dataset to a reservation in the case of a failover scenario.
friendly_nameSTRINGA descriptive name for the dataset.
is_case_insensitiveBOOLTRUE if the dataset and its table names are case-insensitive, otherwiseFALSE. By default, this isFALSE, which means the dataset and its table names are case-sensitive.
  • Datasets:mydataset andMyDataset can coexist in the same project, unless one of them has case-sensitivity turned off.
  • Tables:mytable andMyTable can coexist in the same dataset if case-sensitivity for the dataset is turned on.
is_primaryBOOLEANDeclares if the dataset is the primary replica.
labels<ARRAY<STRUCT<STRING, STRING>>>An array of labels for the dataset, expressed as key-value pairs.
max_time_travel_hoursSMALLINT Specifies the duration in hours of thetime travel window for the dataset. Themax_time_travel_hours value must be an integer expressed in multiples of 24 (48, 72, 96, 120, 144, 168) between 48 (2 days) and 168 (7 days). 168 hours is the default if this option isn't specified.
primary_replicaSTRINGThe replica name to set as theprimary replica.
storage_billing_modelSTRING

Alters thestorage billing model for the dataset. Set thestorage_billing_model value toPHYSICAL to use physical bytes when calculating storage charges, or toLOGICAL to use logical bytes.LOGICAL is the default.

Thestorage_billing_model option is only available for datasets that have been updated after December 1, 2022. For datasets that were last updated before that date, the storage billing model isLOGICAL.

When you change a dataset's billing model, it takes 24 hours for the change to take effect.

Once you change a dataset's storage billing model, you must wait 14 days before you can change the storage billing model again.

tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags for the dataset, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.datasets.getThe dataset to alter.
bigquery.datasets.updateThe dataset to alter.

Examples

Setting the default table expiration for a dataset

The following example sets the default table expiration.

ALTERSCHEMAmydatasetSETOPTIONS(default_table_expiration_days=3.75)

Turning on case insensitivity for a dataset

The following example turns on case insensitivity for the name of a dataset andthe table names within that dataset.

ALTERSCHEMAmydatasetSETOPTIONS(is_case_insensitive=TRUE)

ALTER SCHEMA ADD REPLICA statement

Adds a replica to a schema (preview).

Syntax

ALTERSCHEMA[IFEXISTS][project_name.]dataset_nameADDREPLICAreplica_name[OPTIONS(add_replica_options_list)]

Arguments

  • IF EXISTS: If no dataset exists with that name, the statement has no effect.
  • dataset_name: The name of the table to alter. SeeTable pathsyntax.
  • replica_name: The name of the new replica. Conventionally, this is the sameas the location you are creating the replica in.
  • add_replica_option_list: The list of options toset.

add_replica_options_list

The option list specifies options for the dataset. Specify the options in thefollowing format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
locationSTRINGThe location in which to create the replica.
replica_kms_keySTRINGThe Cloud Key Management Service key set in the destination region.replica_kms_key is used as a substitute encryption key in the destination region for any keys used in the source region. Any table in the source region that's encrypted with a Cloud KMS key is encrypted with thereplica_kms_key. This value must be a Cloud KMS key created in the replica dataset's region, not the source dataset's region. For more information about setting up a Cloud KMS key, seeGrant encryption and decryption permission.

Required permissions

To get the permissions that you need to manage replicas, ask your administrator to grant you theBigQuery Data Editor (roles/bigquery.dataEditor) IAM role on your schema. For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Examples

The following example adds a secondary replica that is namedEU in theEUmulti-region to a schema that is namedcross_region_dataset:

ALTERSCHEMAcross_region_datasetADDREPLICA`EU`OPTIONS(location=`eu`);

ALTER SCHEMA DROP REPLICA statement

Drops a replica from a schema (preview).

Syntax

ALTERSCHEMA[IFEXISTS]dataset_nameDROPREPLICAreplica_name
  • IF EXISTS: If no dataset exists with that name, the statement has no effect.
  • dataset_name: The name of the table to alter. SeeTable path syntax.
  • replica_name: The name of the replica to drop.

Required permissions

To get the permissions that you need to manage replicas, ask your administrator to grant you theBigQuery Data Editor (roles/bigquery.dataEditor) IAM role on your schema. For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Examples

The following example removes a replica that is located in theus-east4 region from thecross_region_dataset dataset:

ALTERSCHEMA[IFEXISTS]cross_region_datasetDROPREPLICA`us-east4`

ALTER TABLE SET OPTIONS statement

Sets the options on a table.

Syntax

ALTERTABLE[IFEXISTS]table_nameSETOPTIONS(table_set_options_list)

Arguments

Details

This statement is not supported forexternal tables.

table_set_options_list

The option list lets you set table options such as alabel and an expiration time. You can include multipleoptions using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
expiration_timestampTIMESTAMP

Example:expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to theexpirationTime table resource property.

partition_expiration_days

FLOAT64

Example:partition_expiration_days=7

Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions don't expire.

This property is equivalent to thetimePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours.

This property can only be set if the table is partitioned.

require_partition_filter

BOOL

Example:require_partition_filter=true

Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value isfalse.

This property is equivalent to thetimePartitioning.requirePartitionFilter table resource property.

This property can only be set if the table is partitioned.

kms_key_name

STRING

Example:kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to theencryptionConfiguration.kmsKeyName table resource property.

See more details aboutProtecting data with Cloud KMS keys.

friendly_name

STRING

Example:friendly_name="my_table"

This property is equivalent to thefriendlyName table resource property.

description

STRING

Example:description="a table that expires in 2025"

This property is equivalent to thedescription table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example:labels=[("org_unit", "development")]

This property is equivalent to thelabels table resource property.

default_rounding_mode

STRING

Example:default_rounding_mode = "ROUND_HALF_EVEN"

This specifies the defaultrounding mode that's used for values written to any newNUMERIC orBIGNUMERIC type columns orSTRUCT fields in the table. It does not impact existing fields in the table. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.5 is rounded to 3.0, and -2.5 is rounded to -3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.5 is rounded to 2.0 and -2.5 is rounded to -2.0.

This property is equivalent to thedefaultRoundingMode table resource property.

enable_change_history

BOOL

Inpreview.

Example:enable_change_history=TRUE

Set this property toTRUE in order to capturechange history on the table, which you can then view by using theCHANGES function. Enabling this table option has an impact on costs; for more information seePricing and costs. The default isFALSE.

max_staleness

INTERVAL

Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND

The maximum interval behind the current time where it's acceptable to read stale data. For example, withchange data capture, when this option is set, the table copy operation is denied if data is more stale than themax_staleness value.

max_staleness is disabled by default.

enable_fine_grained_mutations

BOOL

Inpreview.

Example:enable_fine_grained_mutations=TRUE

Set this property toTRUE to enable fine-grained DML optimization on the table. The default isFALSE.

storage_uri

STRING

Inpreview.

Example:storage_uri=gs://BUCKET_DIRECTORY/TABLE_DIRECTORY/

A fully qualified location prefix for the external folder where data is stored. Supportsgs: buckets.

Required formanaged tables.

file_format

STRING

Inpreview.

Example:file_format=PARQUET

The open-source file format in which the table data is stored. OnlyPARQUET is supported.

Required formanaged tables.

The default isPARQUET.

table_format

STRING

Inpreview.

Example:table_format=ICEBERG

The open table format in which metadata-only snapshots are stored. OnlyICEBERG is supported.

Required formanaged tables.

The default isICEBERG.

tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags for the table, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

  • A reference to a table
  • Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Setting the value replaces the existing value of that option for the table, ifthere was one. Setting the value toNULL clears the table's value for thatoption.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

Setting the expiration timestamp and description on a table

The following example sets the expiration timestamp on a table to seven daysfrom the execution time of theALTER TABLE statement, and sets the descriptionas well:

ALTERTABLEmydataset.mytableSETOPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL7DAY),description="Table that expires seven days from now")

Setting the require partition filter attribute on a partitioned table

The following example sets thetimePartitioning.requirePartitionFilterattribute on apartitioned table:

ALTERTABLEmydataset.mypartitionedtableSETOPTIONS(require_partition_filter=true)

Queries that reference this table must use a filter on the partitioning column,or else BigQuery returns an error. Setting this option totruecan help prevent mistakes in querying more data than intended.

Clearing the expiration timestamp on a table

The following example clears the expiration timestamp on a table so that it willnot expire:

ALTERTABLEmydataset.mytableSETOPTIONS(expiration_timestamp=NULL)

ALTER TABLE ADD COLUMN statement

Adds one or more new columns to an existing table schema.

Syntax

ALTERTABLEtable_nameADDCOLUMN[IFNOTEXISTS]column[,...]

Arguments

  • table_name: The name of the table. SeeTable path syntax.

  • IF NOT EXISTS: If the column name already exists, the statement has no effect.

  • column: The column to add. This includes the name of the column and schemato add. The column name and schema use the same syntax used in theCREATE TABLE statement.

Details

You cannot use this statement to create:

  • Partitioned columns.
  • Clustered columns.
  • Nested columns inside existingRECORD fields.

You cannot add aREQUIRED column to an existing table schema. However, youcan create a nestedREQUIRED column as part of a newRECORD field.

This statement is not supported forexternal tables.

Without theIF NOT EXISTS clause, if the table already contains a column withthat name, the statement returns an error. If theIF NOT EXISTS clause isincluded and the column name already exists, no error is returned, and noaction is taken.

The value of the new column for existing rows is set to one of the following:

  • NULL if the new column was added withNULLABLE mode. This is the defaultmode.
  • An emptyARRAY if the new column was added withREPEATED mode.

For more information about schema modifications in BigQuery, seeModifying table schemas.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

Adding columns

The following example adds the following columns to an existing table namedmytable:

  • ColumnA of typeSTRING.
  • ColumnB of typeGEOGRAPHY.
  • ColumnC of typeNUMERIC withREPEATED mode.
  • ColumnD of typeDATE with a description.
ALTERTABLEmydataset.mytableADDCOLUMNASTRING,ADDCOLUMNIFNOTEXISTSBGEOGRAPHY,ADDCOLUMNCARRAY<NUMERIC>,ADDCOLUMNDDATEOPTIONS(description="my description")

If any of the columns namedA,C, orD already exist, the statement fails.If columnB already exists, the statement succeeds because of theIF NOTEXISTS clause.

Adding aRECORD column

The following example adds a column namedA of typeSTRUCT that contains thefollowing nested columns:

  • ColumnB of typeGEOGRAPHY.
  • ColumnC of typeINT64 withREPEATED mode.
  • ColumnD of typeINT64 withREQUIRED mode.
  • ColumnE of typeTIMESTAMP with a description.
ALTERTABLEmydataset.mytableADDCOLUMNASTRUCT<BGEOGRAPHY,CARRAY<INT64>,DINT64NOTNULL,ETIMESTAMPOPTIONS(description="creation time")>

The query fails if the table already has a column namedA, even if thatcolumn does not contain any of the nested columns that are specified.

The newSTRUCT namedA is nullable, but the nested columnD withinA isrequired for anySTRUCT values ofA.

Adding collation support to a column

When you create a new column for your table, you can specifically assign anew collation specification to that column.

ALTERTABLEmydataset.mytableADDCOLUMNwordSTRINGCOLLATE'und:ci'

ALTER TABLE ADD FOREIGN KEY statement

Adds aforeign key constraint to an existing table.You can add multiple foreign key constraints by using additionalADD FOREIGN KEY statements.

Syntax

ALTERTABLE[[project_name.]dataset_name.]fk_table_nameADD[CONSTRAINT[IFNOTEXISTS]constraint_name]FOREIGNKEY(fk_column_name[,...])REFERENCESpk_table_name(pk_column_name[,...])NOTENFORCED[ADD...];

Arguments

  • project_name: The name of the project containing the table with aprimary key.Defaults to the project that runs this DDL statement if undefined.
  • dataset_name: The name of the dataset that contains the table with a primarykey. Defaults to the project that runs this DDL statement if undefined.
  • fk_table_name: The name of the existing table to add a foreign key to.
  • IF NOT EXISTS: If a constraint of the same name already exists in thedefined table, the statement has no effect.
  • constraint_name: The name of the constraint to add.
  • fk_column_name: In the foreign key table, the name of the foreign keycolumn. Only top-level columns can be used as foreign key columns.
  • pk_table_name: The name of the table that contains the primary key.
  • pk_column_name: In the primary key table, the name of the primary keycolumn. Only top-level columns can be used as primary key columns.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example adds themy_fk_name foreign key constraint to thefk_table table. This example depends on an existing table,pk_table.

  1. Add a primary key to thepk_table table:

    ALTERTABLEpk_tableADDPRIMARYKEY(x,y)NOTENFORCED;

  2. Create a table namedfk_table for the foreign key.

    CREATETABLEfk_table(xint64,yint64,iint64,jint64,uint64,vint64);

  3. Add themy_fk_name foreign key constraint to thefk_table.

    ALTERTABLEfk_tableADDCONSTRAINTmy_fk_nameFOREIGNKEY(u,v)REFERENCESpk_table(x,y)NOTENFORCED

The following example adds thefk andfk2 foreign key constraints to thefk_table table in a single statement. This example depends on an existingtable,pk_table.

  1. Add a primary key to thepk_table table:

    ALTERTABLEpk_tableADDPRIMARYKEY(x,y)NOTENFORCED;

  2. Create a table namedfk_table for multiple foreign key constraints.

    CREATETABLEfk_table(xint64,yint64,iint64,jint64,uint64,vint64);

  3. Add thefk andfk2 constraints tofk_table in one statement.

    ALTERTABLEfk_tableADDPRIMARYKEY(x,y)NOTENFORCED,ADDCONSTRAINTfkFOREIGNKEY(u,v)REFERENCESpk_table(x,y)NOTENFORCED,ADDCONSTRAINTfk2FOREIGNKEY(i,j)REFERENCESpk_table(x,y)NOTENFORCED;

ALTER TABLE ADD PRIMARY KEY statement

Adds aprimary key to an existing table.

Syntax

ALTERTABLE[[project_name.]dataset_name.]table_nameADDPRIMARYKEY(column_list)NOTENFORCED;

Arguments

  • project_name: The name of the project containing the table with a primarykey. Defaults to the project that runs this DDL statement if undefined.
  • dataset_name: The name of the dataset that contains the table with a primarykey.
  • table_name: The name of the existing table with a primary key.
  • column_list: The list of columns to be added as primary keys.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example adds the primary key constraint ofx andyto thepk_table table.

ALTERTABLEpk_tableADDPRIMARYKEY(x,y)NOTENFORCED;

ALTER TABLE RENAME TO statement

Renames a clone, snapshot, or table.

TheALTER TABLE RENAME TO statement recreates the table in the destinationdataset with the creation timestamp of the original table. If you haveconfigureddataset-level tableexpiration, the renamedtable might be immediately deleted if its original creation timestamp fallsoutside of the expiration window.

Caution: Renaming a table deletes alltags (deprecated) oraspects that may be attachedto it or its columns inData Catalog orDataplex Universal Catalog,respectively.

Syntax

ALTERTABLE[IFEXISTS]table_nameRENAMETOnew_table_name

Arguments

  • IF EXISTS: If no table exists with that name, the statement has no effect.

  • table_name: The name of the table to rename. SeeTable path syntax.

  • new_table_name: The new name of the table. The value ofnew_table_name must onlyinclude the name of the table, not the fulltable path syntax.The new name cannot be an existing table name.

Details

  • If you want to rename a table that has data streaming into it, you must stopthe streaming, commit any pending streams, and waitfor BigQuery to indicate that streamingis not in use.
  • While a table can usually be renamed 5 hours after the last streamingoperation, it might take longer.
  • Existing table ACLs and row access policies are preserved, but table ACL androw access policy updates made during the table rename are not preserved.
  • You can't concurrently rename a table and run a DML statement on that table.
  • Renaming a table removes allData Catalog tags(deprecated) andDataplex Universal Catalog aspectson the table.
  • Any search index or vector index created on the table is dropped when the table is renamed.
  • You can't rename external tables.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

Renaming a table

The following example renames the tablemydataset.mytable tomydataset.mynewtable:

ALTERTABLEmydataset.mytableRENAMETOmynewtable

ALTER TABLE RENAME COLUMN statement

Caution: Renaming a column deletes allData Catalog tags(deprecated) andDataplex Universal Catalog aspectsthat are attached to it. Primary key columns can't be renamed.

Renames one or more columns in an existing table schema.

Syntax

ALTERTABLE[IFEXISTS]table_nameRENAMECOLUMN[IFEXISTS]column_to_column[,...]column_to_column:=column_nameTOnew_column_name

Arguments

  • (ALTER TABLE) IF EXISTS: If the specified table does not exist, thestatement has no effect.

  • table_name: The name of the table to alter. SeeTable path syntax.

  • (ALTER COLUMN) IF EXISTS: If the specified column does not exist, thestatement has no effect.

  • column_name: The name of the top-level column you're altering.

  • new_column_name: The new name of the column. The new name cannot be anexisting column name.

Details

This statement is not supported forexternal tables.

If the table to be modified has active row-level access policies, the statementreturns an error.

Without theIF EXISTS clause, if the table does not contain a column with thatname, then the statement returns an error. If theIF EXISTS clause is includedand the column name does not exist, then no error is returned, and no action istaken.

This statement only renames the column from the table. Any objects that refer tothe column, such as views or materialized views, must be updated or recreatedseparately.

You cannot use this statement to rename the following:

  • Subfields, such as nested columns in aSTRUCT
  • Partitioning columns
  • Clustering columns
  • Fields that are part of primary key constraints or foreign key constraints
  • Columns in a table that has row access policies

After one or more columns in a table are renamed, you cannot do the following:

  • Query the table with legacy SQL.
  • Query the table as a wildcard table.

Renaming thecolumns with their original names removes these restrictions.

MultipleRENAME COLUMN statements in oneALTER TABLE statement aresupported. The sequence of renames are interpreted and validated in order.Eachcolumn_name must refer to a column name that exists after all precedingrenames have been applied.RENAME COLUMN cannot be used with otherALTERTABLE actions in one statement.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

Renaming columns

The following example renames columns from an existing table namedmytable:

  • ColumnA ->columnA
  • ColumnB ->columnB
ALTERTABLEmydataset.mytableRENAMECOLUMNATOcolumnA,RENAMECOLUMNIFEXISTSBTOcolumnB

If columnA does not exist, then the statement fails. If columnB does notexist, then the statement still succeeds because of theIF EXISTS clause.

The following example swaps the names ofcolumnA andcolumnB:

ALTERTABLEmydataset.mytableRENAMECOLUMNcolumnATOtemp_col,RENAMECOLUMNcolumnBTOcolumnA,RENAMECOLUMNtemp_colTOcolumnB

ALTER TABLE DROP COLUMN statement

Drops one or more columns from an existing table schema.

Syntax

ALTERTABLEtable_nameDROPCOLUMN[IFEXISTS]column_name[,...]

Arguments

  • table_name: The name of the table to alter. SeeTable path syntax. The table must already exist and have aschema.

  • IF EXISTS: If the specified column does not exist, the statement has noeffect.

  • column_name: The name of the column to drop.

Details

Dropping a column is a metadata-only operation and does notimmediately free up the storage that is associated with the dropped column. Thestorage is freed up the next time the table is written to, typically when youperform a DML operation on it or when a background optimzation job happens.SinceDROP COLUMN is not a data cleanup operation, there is no guaranteedtime window within which the data will be deleted.

There are two options for immediately reclaiming storage:

  • Overwrite a table with aSELECT * EXCEPT query.
  • Export the data to Cloud Storage, delete the unwanted columns, and thenload the data into a new table with the correct schema.

You can restore a dropped column in a table usingtime travel.You cannot use this statement to drop the following:

After one or more columns in a table are dropped you cannot do the following:

  • Query the table with legacy SQL.
  • Query the table as a wildcard table.

This statement is not supported forexternal tables.

Without theIF EXISTS clause, if the table does not contain a column with thatname, then the statement returns an error. If theIF EXISTS clause is included andthe column name does not exist, then no error is returned, and no action is taken.

This statement only removes the column from the table. Any objects that refer tothe column, such as views or materialized views, must be updated or recreatedseparately.

For more information about schema modifications inBigQuery, seeModifying table schemas.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

Dropping columns

The following example drops the following columns from an existing table namedmytable:

  • ColumnA
  • ColumnB
ALTERTABLEmydataset.mytableDROPCOLUMNA,DROPCOLUMNIFEXISTSB

If the column namedA does not exist, then the statement fails. If columnBdoes not exist, then the statement still succeeds because of theIF EXISTS clause.

After one or more columns in a table are dropped, you cannot do the following:

  • Query the table with legacy SQL.
  • Accelerate queries on the table with BigQuery BI Engine.
  • Query the table as a Wildcard Table.
  • Copy the table in the Google Cloud console.
  • Copy the table using thebq cp command.

Recreating the table usingCREATE TABLE ... AS SELECT ... removes these restrictions.

ALTER TABLE DROP CONSTRAINT statement

Drops a constraint from an existing table. You can use this statement to dropforeign key constraints from a table.

Syntax

ALTERTABLE[[project_name.]dataset_name.]table_nameDROPCONSTRAINT[IFEXISTS]constraint_name;

Arguments

  • project_name: The name of the project containing the table with a primary key. Defaults to the project that runs this DDL statement if undefined.
  • dataset_name: The name of the dataset that contains the table with a primary key.
  • table_name: The name of the existing table with a primary key.
  • IF EXISTS: If no primary key exists in the defined table, the statement has no effect.
  • constraint_name: The name of the constraint to drop.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example drops the constraintmyConstraint from the existingtablemyTable.

ALTERTABLEmytableDROPCONSTRAINTmyConstraint;

ALTER TABLE DROP PRIMARY KEY statement

Drops aprimary key from anexisting table.

Syntax

ALTERTABLE[[project_name.]dataset_name.]table_nameDROPPRIMARYKEY[IFEXISTS];

Arguments

  • project_name: The name of the project containing the table with a primarykey. Defaults to the project that runs this DDL statement if undefined.
  • dataset_name: The name of the dataset that contains the table with aprimary key.
  • table_name: The name of the existing table with a primary key.
  • IF EXISTS: If no primary key exists in the defined table, the statementhas no effect.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example drops all primary keys from the existing tablemyTable.

ALTERTABLEmyTableDROPPRIMARYKEY;

ALTER TABLE SET DEFAULT COLLATE statement

Setscollation specifications ona table.

Syntax

ALTERTABLEtable_nameSETDEFAULTCOLLATEcollate_specification

Arguments

  • table_name: The name of the table to alter. SeeTable path syntax. The table must already exist and have aschema.

  • SET DEFAULT COLLATE collate_specification: When a new column is created in theschema, and if the column does not have an explicitcollation specification,thecolumn inherits thiscollation specification forSTRING types. The updatedcollation specification only applies to columns added afterwards.

    If you want to add a collation specification on a new column inan existing table, you can do this when youadd the column. If you add acollation specification directly on a column, the collation specificationfor the column has precedence over a table's default collation specification.You cannot update an existing collation specification on a column.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Example

Assume you have an existing table,mytable, in a schema calledmydataset.

CREATETABLEmydataset.mytable(numberINT64,wordSTRING)DEFAULTCOLLATE'und:ci'

When you createmytable, allSTRING columns inheritCOLLATE 'und:ci'.The resulting table has this structure:

+--------------------------------+| mydataset.mytable              ||   number INT64                 ||   word STRING COLLATE 'und:ci' |+--------------------------------+

At a later time, you decide to change the collation specification for yourtable.

ALTERTABLEmydataset.mytableSETDEFAULTCOLLATE''

Although you have updated the collation specification, your existing column,word, continues to use the previous collation specification.

+--------------------------------+| mydataset.mytable              ||   number INT64                 ||   word STRING COLLATE 'und:ci' |+--------------------------------+

However, if you create a new column for your table, the new column includes thenew collation specification. In the following example a column callednameis added. Because the new collation specification is empty, the defaultcollation specification is used.

ALTERTABLEmydataset.mytableADDCOLUMNnameSTRING
+--------------------------------+| mydataset.mytable              ||   number INT64                 ||   word STRING COLLATE 'und:ci' ||   name STRING COLLATE          |+--------------------------------+

ALTER COLUMN SET OPTIONS statement

Sets options, such as the column description, on a column in a table or viewin BigQuery.

Syntax

ALTER{TABLE|VIEW}[IFEXISTS]nameALTERCOLUMN[IFEXISTS]column_nameSETOPTIONS({column_set_options_list|view_column_set_options_list})

Arguments

  • (ALTER { TABLE | VIEW }) IF EXISTS: If no table or view exists with thatname, then the statement has no effect.

  • name: The name of the table or view to alter. SeeTable path syntax.

  • (ALTER COLUMN) IF EXISTS: If the specified column does not exist, thestatement has no effect.

  • column_name: The name of the top-level column you're altering. Modifyingsubfields, such as nested columns in aSTRUCT, is not supported.

  • column_set_options_list: The list of options toset on the column of the table. This option must be used withTABLE.

  • view_column_set_options_list: The list ofoptions to set on the column of the view. This option must be used withVIEW.

Details

This statement is not supported forexternal tables.

column_set_options_list

Specify a column option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
description

STRING

Example:description="a unique id"

This property is equivalent to theschema.fields[].description table resource property.

rounding_mode

STRING

Example:rounding_mode = "ROUND_HALF_EVEN"

This specifies therounding mode that's used for values written to aNUMERIC orBIGNUMERIC type column orSTRUCT field. The following values are supported:

  • "ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
  • "ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.

This property is equivalent to theroundingMode table resource property.

data_policiesARRAY<STRING>

Applies adata policy to a column in a table (Preview).

Example:data_policies = ["{'name':'myproject.region-us.data_policy_name1'}", "{'name':'myproject.region-us.data_policy_name2'}"]

TheALTER TABLE ALTER COLUMN statement supports the= and+= operators to add data policies to a specific column.

Example:data_policies +=["data_policy1", "data_policy2"]

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

  • A reference to a table
  • Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Setting theVALUE replaces the existing value of that option for the column, ifthere was one. Setting theVALUE toNULL clears the column's value for thatoption.

view_column_set_options_list

Theview_column_option_list lets you specify optional top-level columnoptions. Column options for a view have the same syntax and requirements asfor a table, but with a different list ofNAME andVALUE fields:

NAMEVALUEDetails
description

STRING

Example:description="a unique id"

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example sets a new description on a table column calledprice:

ALTERTABLEmydataset.mytableALTERCOLUMNpriceSETOPTIONS(description='Price per unit');

The following example sets a new description on a view column calledtotal:

ALTERVIEWmydataset.myviewALTERCOLUMNtotalSETOPTIONS(description='Total sales of the product');

ALTER COLUMN DROP NOT NULL statement

Removes aNOT NULL constraint from a column in a table in BigQuery.

Syntax

ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]columnDROPNOTNULL

Arguments

  • (ALTER TABLE) IF EXISTS: If no table exists with that name, the statementhas no effect.

  • table_name: The name of the table to alter. SeeTable path syntax.

  • (ALTER COLUMN) IF EXISTS: If the specified column does not exist, thestatement has no effect.

  • column_name: The name of the top level column you're altering. Modifyingsubfields is not supported.

Details

If a column does not have aNOT NULL constraint the query returns an error.

This statement is not supported forexternal tables.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example removes theNOT NULL constraint from a column calledmycolumn:

ALTERTABLEmydataset.mytableALTERCOLUMNmycolumnDROPNOTNULL

ALTER COLUMN SET DATA TYPE statement

Changes the data type of a column in a table in BigQueryto a less restrictive data type. For example, aNUMERIC data type can be changedto aBIGNUMERIC type but not the reverse.

Syntax

ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]column_nameSETDATATYPEcolumn_schema

Arguments

  • (ALTER TABLE) IF EXISTS: If no table exists with that name, the statementhas no effect.

  • table_name: The name of the table to alter. SeeTable path syntax.

  • (ALTER COLUMN) IF EXISTS: If the specified column does not exist, thestatement has no effect.

  • column_name: The name of the top level column you're altering. Modifyingsubfields is not supported.

  • column_schema: The schema that you're converting the column to. This schemauses the same syntax used in theCREATE TABLEstatement.

Details

The following data type conversions are supported::

  • INT64 toNUMERIC,BIGNUMERIC,FLOAT64
  • NUMERIC toBIGNUMERIC,FLOAT64

You can also convert data types from more restrictive to less restrictiveparameterized data types.For example, you can increase the maximum length of a string type or increase theprecision or scale of a numeric type.

The following are examples of valid parameterized data type conversions:

  • NUMERIC(10, 6) toNUMERIC(12, 8)
  • NUMERIC toBIGNUMERIC(40, 20)
  • STRING(5) toSTRING(7)

This statement is not supported forexternal tables.

Without theIF EXISTS clause, if the table does not contain a column with thatname, the statement returns an error. If theIF EXISTS clause is included andthe column name does not exist, no error is returned, and no action is taken.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

Changing the data type for a column

The following example changes the data type of columnc1 from anINT64 toNUMERIC:

CREATETABLEdataset.my_table(c1INT64);ALTERTABLEdataset.my_tableALTERCOLUMNc1SETDATATYPENUMERIC;

Changing the data type for a field

The following example changes the data type of one of the fields in thes1 column:

CREATETABLEdataset.my_table(s1STRUCT<aINT64,bSTRING>);ALTERTABLEdataset.my_tableALTERCOLUMNs1SETDATATYPESTRUCT<aNUMERIC,bSTRING>;

Changing precision

The following example changes the precision of a parameterized data typecolumn:

CREATETABLEdataset.my_table(ptNUMERIC(7,2));ALTERTABLEdataset.my_tableALTERCOLUMNptSETDATATYPENUMERIC(8,2);

ALTER COLUMN SET DEFAULT statement

Sets thedefault value of a column.

Syntax

ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]column_nameSETDEFAULTdefault_expression;

Arguments

Details

Setting the default value for a column only affects future inserts to the table.It does not change any existing table data.

The type of the default value must match the type of the column.ASTRUCT type can only have a default value set for the entireSTRUCT field. Youcannot set the default value for a subset of the fields. You cannot set thedefault value of an array toNULL or set an element withinan array toNULL.

If the default value is a function, it is evaluated at the time that the valueis written to the table, not the time the table is created.

You can't set default values on columns that areprimary keys.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example sets the default value of the columnmycolumn to thecurrent time:

ALTERTABLEmydataset.mytableALTERCOLUMNmycolumnSETDEFAULTCURRENT_TIME();

ALTER COLUMN DROP DEFAULT statement

Removes thedefault value assigned to a column.This is the same as setting the default value toNULL.

Syntax

ALTERTABLE[IFEXISTS]table_nameALTERCOLUMN[IFEXISTS]column_nameDROPDEFAULT;

Arguments

  • (ALTER TABLE) IF EXISTS: If the specified table does not exist, thestatement has no effect.

  • table_name: The name of the table to alter. SeeTable path syntax.

  • (ALTER COLUMN) IF EXISTS: If the specified column does not exist, thestatement has no effect.

  • column_name: The name of the top-level column to remove the default valuefrom. If you drop the default value from a column that does not havea default set, an error is returned.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe table to alter.
bigquery.tables.updateThe table to alter.

Examples

The following example removes the default value from the columnmycolumn:

ALTERTABLEmydataset.mytableALTERCOLUMNmycolumnDROPDEFAULT;

ALTER VIEW SET OPTIONS statement

Sets the options on aview.

Syntax

ALTERVIEW[IFEXISTS]view_nameSETOPTIONS(view_set_options_list)

Arguments

view_set_options_list

The option list allows you to set view options such as alabel and an expiration time. You can include multipleoptions using a comma-separated list.

Specify a view option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
expiration_timestampTIMESTAMP

Example:expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to theexpirationTime table resource property.

friendly_name

STRING

Example:friendly_name="my_view"

This property is equivalent to thefriendlyName table resource property.

description

STRING

Example:description="a view that expires in 2025"

This property is equivalent to thedescription table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example:labels=[("org_unit", "development")]

This property is equivalent to thelabels table resource property.

privacy_policy

JSON-formatted STRING

The policies to enforce when anyone queries the view. To learn more about the policies available for a view, see theprivacy_policy view option.

Note: Time travel is disabled on any view that has an analysis rule.
tags<ARRAY<STRUCT<STRING, STRING>>>An array of IAM tags for the view, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

  • A reference to a table
  • Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
  • User-defined functions, aggregate functions, or analytic functions
  • The following scalar functions:
    • ARRAY_TO_STRING
    • REPLACE
    • REGEXP_REPLACE
    • RAND
    • FORMAT
    • LPAD
    • RPAD
    • REPEAT
    • SESSION_USER
    • GENERATE_ARRAY
    • GENERATE_DATE_ARRAY

Setting the value replaces the existing value of that option for the view, ifthere was one. Setting the value toNULL clears the view's value for thatoption.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe view to alter.
bigquery.tables.updateThe view to alter.

Examples

Setting the expiration timestamp and description on a view

The following example sets the expiration timestamp on a view to seven daysfrom the execution time of theALTER VIEW statement, and sets the descriptionas well:

ALTERVIEWmydataset.myviewSETOPTIONS(expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(),INTERVAL7DAY),description="View that expires seven days from now")

ALTER MATERIALIZED VIEW SET OPTIONS statement

Sets the options on a materialized view.

Syntax

ALTERMATERIALIZEDVIEW[IFEXISTS]materialized_view_nameSETOPTIONS(materialized_view_set_options_list)

Arguments

materialized_view_set_options_list

The option list allows you to set materialized view options such as a whetherrefresh is enabled. the refresh interval, alabel andan expiration time. You can include multiple options using a comma-separatedlist.

Specify a materialized view option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAMEVALUEDetails
enable_refreshBOOLEAN

Example:enable_refresh=false
Default:true

refresh_interval_minutesFLOAT64

Example:refresh_interval_minutes=20
Default:refresh_interval_minutes=30

expiration_timestampTIMESTAMP

Example:expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"

This property is equivalent to theexpirationTime table resource property.expiration_timestamp is optional and not used by default.

max_stalenessINTERVAL

Example:max_staleness=INTERVAL "4:0:0" HOUR TO SECOND

Themax_staleness property provides consistently high performance with controlled costs when processing large, frequently changing datasets.max_staleness is disabled by default.

allow_non_incremental_definitionBOOLEAN

Example:allow_non_incremental_definition=true

Theallow_non_incremental_definition property supports an expanded range of SQL queries to create materialized views.allow_non_incremental_definition=true is disabled by default.CREATE MATERIALIZED VIEW statement support only. Theallow_non_incremental_definition property can't be changed after the materialized view is created.

kms_key_name

STRING

Example:kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to theencryptionConfiguration.kmsKeyName table resource property.

See more details aboutProtecting data with Cloud KMS keys.

friendly_name

STRING

Example:friendly_name="my_mv"

This property is equivalent to thefriendlyName table resource property.

description

STRING

Example:description="a materialized view that expires in 2025"

This property is equivalent to thedescription table resource property.

labels

ARRAY<STRUCT<STRING, STRING>>

Example:labels=[("org_unit", "development")]

This property is equivalent to thelabels table resource property.

tagsARRAY<STRUCT<STRING, STRING>>An array of IAM tags for the materialized view, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

Setting the value replaces the existing value of that option for thematerialized view, if there was one. Setting the value toNULL clears thematerialized view's value for that option.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.getThe materialized view to alter.
bigquery.tables.updateThe materialized view to alter.

Examples

Setting the enable refresh state and refresh interval on a materialized view

The following example enables refresh and sets the refresh interval to 20minutes on a materialized view:

ALTERMATERIALIZEDVIEWmydataset.my_mvSETOPTIONS(enable_refresh=true,refresh_interval_minutes=20)

ALTER ORGANIZATION SET OPTIONS statement

Sets the options on an organization.

Syntax

ALTERORGANIZATIONSETOPTIONS(organization_set_options_list);

Arguments

organization_set_options_list

The option list specifies options for the organization. Specify the options in thefollowing format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
default_kms_key_nameSTRING

The default Cloud Key Management Service key for encrypting table data, including temporary or anonymous tables. For more information, seeCustomer-managed Cloud KMS keys.

Example:kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to theencryptionConfiguration.kmsKeyName table resource property.

default_time_zoneSTRING

The default time zone to use in time zone-dependent SQL functions, when a time zone is not specified as an argument. For more information, seetime zones.

Example:`region-us.default_time_zone` = "America/Los_Angeles". Sets the default time zone toAmerica/Los_Angeles in theus region.

default_query_job_timeout_msINT64

The default time after which a query job times out. The timeout period must be between 5 minutes and 48 hours.

Example:`region-us.default_query_job_timeout_ms` = 1800000. Sets the default query job timeout time to 30 minutes for all jobs in theus region.

default_interactive_query_queue_timeout_msINT64

The default amount of time that an interactive query is queued. If unset, the default is 6 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable interactive query queueing, set the value to -1.

Example:`region-us.default_interactive_query_queue_timeout_ms` = 1800000. Sets the default queue timeout for interactive queries in theus region to 30 minutes.

default_batch_query_queue_timeout_msINT64

The default amount of time that a batch query is queued. If unset, the default is 24 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable batch query queueing, set the value to -1.

Example:`region-us.default_batch_query_queue_timeout_ms` = 1800000. Sets the default queue timeout for batch queries in theus region to 30 minutes.

default_query_optimizer_optionsSTRING

The history-based query optimizations. This option can be one of the following:

  • 'adaptive=on': Use history-based query optimizations.
  • 'adaptive=off': Don't use history-based query optimizations.
  • NULL (default): Use the default history-based query optimizations setting, which is equivalent to'adaptive=on'.

Example:`region-us.default_query_optimizer_options` = 'adaptive=on'

query_runtimeSTRING

Specifies whether the BigQuery query processor uses theadvanced runtime. Set thequery_runtime value toadvanced to enable the advanced runtime before it's rolled out as the default runtime.

Example:`region-us.query_runtime` = 'advanced'. Enables the advanced runtime.

Setting the value replaces the existing value of that option for theorganization, if there is one. Setting the value toNULL clears theorganization's value for that option.

Required permissions

TheALTER ORGANIZATION SET OPTIONS statement requires the followingIAM permissions:

PermissionResource
bigquery.config.updateThe organization to alter.

Examples

The following example sets the default time zone to America/Chicago and the default query job timeout to one hour for an organization in the US region:

ALTERORGANIZATIONSETOPTIONS(`region-us.default_time_zone`="America/Chicago",`region-us.default_job_query_timeout_ms`=3600000);

The following example sets the default time zone, the default query job timeout,the default interactive and batch queue timeouts, and the defaultCloud KMS key, clearing the organization level default settings:

ALTERORGANIZATIONSETOPTIONS(`region-us.default_time_zone`=NULL,`region-us.default_kms_key_name`=NULL,`region-us.default_query_job_timeout_ms`=NULL,`region-us.default_interactive_query_queue_timeout_ms`=NULL,`region-us.default_batch_query_queue_timeout_ms`=NULL);

ALTER PROJECT SET OPTIONS statement

Sets the options on a project.

Syntax

ALTERPROJECTproject_idSETOPTIONS(project_set_options_list);

Arguments

  • project_id: The name of the project you're altering. This argument is optional, and defaults to the project that runs this DDL query.
  • project_set_options_list:The list of options to set.

project_set_options_list

The option list specifies options for the project. Specify the options in thefollowing format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
default_kms_key_nameSTRING

The default Cloud Key Management Service key for encrypting table data, including temporary or anonymous tables. For more information, seeCustomer-managed Cloud KMS keys.

Example:kms_key_name="projects/project_id/locations/location/keyRings/keyring/cryptoKeys/key"

This property is equivalent to theencryptionConfiguration.kmsKeyName table resource property.

default_time_zoneSTRING

The default time zone to use in time zone-dependent SQL functions, when a time zone is not specified as an argument. For more information, seetime zones.

Example:`region-us.default_time_zone` = "America/Los_Angeles". Sets the default time zone toAmerica/Los_Angeles in theus region.

default_query_job_timeout_msINT64

The default time after which a query job times out. The timeout period must be between 5 minutes and 48 hours.

Example:`region-us.default_query_job_timeout_ms` = 1800000. Sets the default query job timeout time to 30 minutes for jobs run in theus region.

default_interactive_query_queue_timeout_msINT64

The default amount of time that an interactive query is queued. If unset, the default is 6 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable interactive query queueing, set the value to -1.

Example:`region-us.default_interactive_query_queue_timeout_ms` = 1800000. Sets the default queue timeout for interactive queries in theus region to 30 minutes.

default_batch_query_queue_timeout_msINT64

The default amount of time that a batch query is queued. If unset, the default is 24 hours. The minimum value is 1 millisecond. The maximum value is 48 hours. To disable batch query queueing, set the value to -1.

Example:`region-us.default_batch_query_queue_timeout_ms` = 1800000. Sets the default queue timeout for batch queries in theus region to 30 minutes.

default_query_optimizer_optionsSTRING

The history-based query optimizations. This option can be one of the following:

  • 'adaptive=on': Use history-based query optimizations.
  • 'adaptive=off': Don't use history-based query optimizations.
  • NULL (default): Use the default history-based query optimizations setting, which is equivalent to'adaptive=on'.

Example:`region-us.default_query_optimizer_options` = 'adaptive=on'

default_cloud_resource_connection_idSTRING

The default connection to use when creating tables and models ([Preview](/products#product-launch-stages)). Only specify the connection's ID, and exclude the attached project ID and region prefixes. Using default connections can cause the permissions granted to the connection's service account to be updated, depending on the type of table or model you create. For more information, see theDefault connection overview.

Example:`region-us.default_cloud_resource_connection_id` = "connection_1". Sets the default connection toconnection_1 in theus region.

default_sql_dialect_optionSTRING

The default sql query dialect for executing query jobs using the bq command-line tool or BigQuery API. Changing this setting doesn't affect the default dialect in the console. This option can be one of the following:

  • 'default_legacy_sql'(default): Use legacy SQL if the query dialect isn't specified at the job level.
  • 'default_google_sql': Use GoogleSQL if the query dialect isn't specified at the job level.
  • 'only_google_sql': Use GoogleSQL if the query dialect is not specified at the job level. Reject jobs with query dialect set to legacy SQL.
  • 'NULL': Use the default query dialect setting, which is equivalent to'default_legacy_sql'.

Example:`region-us.default_sql_dialect_option` = 'default_google_sql'. Use google SQL if the query dialect isn't specified at the job level.

query_runtimeSTRING

Specifies whether the BigQuery query processor uses theadvanced runtime. Set thequery_runtime value toadvanced to enable the advanced runtime before it's rolled out as the default runtime.

Example:`region-us.query_runtime` = 'advanced'. Enables the advanced runtime.

enable_reservation_based_fairnessBOOL

Determines how idle slots are shared. Iffalse (default), idle slots are equally distributed across all query projects. Iftrue, idle slots are shared equally across all reservations first, and then across projects within the reservation. For more information, seereservation-based fairness.

Example:`region-us.enable_reservation_based_fairness` = true. Enables reservation-based fairness.

Setting the value replaces the existing value of that option for the project, if there was one. Setting the value toNULL clears theproject's value for that option.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.config.updateThe project to alter.

Examples

The following example sets the default time zone toAmerica/New_York and the default query job timeout to 30 minutes for a project in theus region.

ALTERPROJECTproject_idSETOPTIONS(`region-us.default_time_zone`="America/New_York",`region-us.default_job_query_timeout_ms`=1800000);

The following example sets the default time zone, the default query job timeout, the default Cloud KMS key toNULL, and the default interactive and batch queue timeouts and default sql dialect, clearing the project level default settings:

ALTERPROJECTproject_idSETOPTIONS(`region-us.default_time_zone`=NULL,`region-us.default_kms_key_name`=NULL,`region-us.default_query_job_timeout_ms`=NULL,`region-us.default_interactive_query_queue_timeout_ms`=NULL,`region-us.default_batch_query_queue_timeout_ms`=NULL,`region-us.default_sql_dialect_option`=NULL);

ALTER BI_CAPACITY SET OPTIONS statement

Sets the options on BigQuery BI Engine capacity.

Syntax

ALTERBI_CAPACITY`project_id.location_id.default`SETOPTIONS(bi_capacity_options_list)

Arguments

  • project_id: Optional project ID of the project that will benefit from BI Engine acceleration. If omitted, the query project ID is used.

  • location_id: Thelocation where data needs to be cached, prefixed withregion-. Examples:region-us,region-us-central1.

  • bi_capacity_options_list:The list of options to set.

bi_capacity_options_list

The option list specifies a set of options for BigQuery BI Engine capacity.

Specify a column option list in the following format:

NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
size_gbINT64Specifies the size of the reservation in gigabytes.
preferred_tables<ARRAY<STRING>>List of tables that acceleration should be applied to. Format:project.dataset.table ordataset.table. If project is omitted, query project is used.

SettingVALUE replaces the existing value of that option for the BI Enginecapacity, if there is one. SettingVALUE toNULL clears the valuefor that option.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.bireservations.updateBI Engine reservation

Examples

Allocating BI Engine capacity without preferred tables

ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(size_gb=250)

Deallocating BI capacity

ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(size_gb=0)

Removing a set of preferred tables from reservation

ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(preferred_tables=NULL)

Allocating BI Capacity with preferred tables list

ALTERBI_CAPACITY`my-project.region-us.default`SETOPTIONS(size_gb=250,preferred_tables=["data_project1.dataset1.table1","data_project2.dataset2.table2"])

Overwriting list of preferred tables without changing the size

ALTERBI_CAPACITY`region-us.default`SETOPTIONS(preferred_tables=["dataset1.table1","data_project2.dataset2.table2"])

ALTER CAPACITY SET OPTIONS statement

Alters an existing capacity commitment.

Syntax

ALTERCAPACITY`project_id.location_id.commitment_id`SETOPTIONS(alter_capacity_commitment_option_list);

Arguments

  • project_id: The project ID of the administration project that maintains ownership of this commitment.
  • location_id: Thelocation ofthe commitment.
  • commitment_id: The ID of the commitment. The value must be unique to theproject and location. It must start and end with a lowercase letter or anumber and contain only lowercase letters, numbers and dashes.
  • alter_capacity_commitment_option_list: The options you can set to alter the capacity commitment.

alter_capacity_commitment_option_list

The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...

The following options are supported:

NAMETYPEDetails
planStringThe commitment plan to purchase. Supported values include:ANNUAL,THREE_YEAR, andTRIAL. For more information, seeslot commitments.
renewal_planStringThe plan this capacity commitment is converted to aftercommitment_end_time passes. Once the plan is changed, the committed period is extended according to the commitment plan. Applicable for ANNUAL, THREE_YEAR, and TRIAL commitments.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.capacityCommitments.updateThe administration project thatmaintains ownership of thecommitments.

Example

The following example changes a capacity commitment to a three-year plan that islocated in theregion-us region and managed by a projectadmin_project:

ALTERCAPACITY`admin_project.region-us.my-commitment`SETOPTIONS(plan='THREE_YEAR');

ALTER RESERVATION SET OPTIONS statement

Alters an existing reservation.

Syntax

ALTERRESERVATION`project_id.location_id.reservation_id`SETOPTIONS(alter_reservation_option_list);

Arguments

  • project_id: The project ID of the administration project that maintains ownership of this reservation.
  • location_id: Thelocation ofthe reservation.
  • reservation_id: The ID of the reservation. The value must be unique to theproject and location. It must start and end with a lowercase letter or anumber and contain only lowercase letters, numbers and dashes.
  • alter_reservation_option_list: The options you can set to alter the reservation.

alter_reservation_option_list

The option list specifies options for the dataset. Specify the options in the following format:NAME=VALUE, ...

The following options are supported:

NAMETYPEDetails
ignore_idle_slotsBOOLEANIf the value istrue, then the reservation uses only the slots that are provisioned to it. The default value isfalse. For more information, seeIdle slots.
slot_capacityINTEGERThe number of slots to allocate to the reservation. If this reservation was created with anedition, this is equivalent to the amount ofbaseline slots.
target_job_concurrencyINTEGERA soft upper bound on the number of jobs that can run concurrently in this reservation.
autoscale_max_slotsINTEGERThe maximum number of slots that can be added to the reservation by autoscaling.
secondary_locationSTRINGThe secondary location to use in the case of disaster recovery.
is_primaryBOOLEANIf the value istrue, the reservation is set to be the primary reservation.
labels<ARRAY<STRUCT<STRING, STRING>>>An array of labels for the reservation, expressed as key-value pairs.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.reservations.updateThe administration project thatmaintains ownership of thecommitments.

Examples

Autoscaling example

The following example changes an autoscaling reservation to 300 baseline slotsand 400 autoscaling slots for a max reservation size of 700. These slots arelocated in theregion-us region and managed by a projectadmin_project:

ALTERRESERVATION`admin_project.region-us.my-reservation`SETOPTIONS(slot_capacity=300,autoscale_max_slots=400);

ALTER VECTOR INDEX REBUILD statement

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To provide feedback or request support for this feature, send email tobq-vector-search@google.com.

Rebuild avector index on a table.

Syntax

ALTERVECTORINDEX[IFEXISTS]index_nameONtable_nameREBUILD;

Arguments

  • IF EXISTS: If no vector index exists with that name, the statement hasno effect.

  • index_name: The name of the vector index to rebuild.

  • table_name: The name of the table that the vector index is on.SeeTable path syntax.

    If the table doesn't contain a vector index, or if the table contains adifferent vector index than the one specified in theindex_nameargument, the query fails.

  • REBUILD: Indicates that the statement rebuilds the vector index. Thisargument is required.

Details

Use theALTER VECTOR INDEX REBUILD statement to rebuild an active vectorindex on a table without having to drop the vector index, and without anyindex downtime. When you run the statement, BigQuery creates ashadow index on the table and trains it in the background.BigQuery promotes the shadow index to be the active indexwhen the shadow index has enough coverage.

To run theALTER VECTOR INDEX REBUILD statement, you mustcreate a reservation assignmentwith a job type ofBACKGROUND for the project that contains the table.If you run the statement without an appropriate reservation, the query fails.

You can have only one vector index rebuild in progress at a time. TheALTER VECTOR INDEX REBUILD statement completes before the shadow indexreplaces the active index, because the shadow index training and cutoverhappen asynchronously. If you start another vector indexrebuild before the shadow index replaces the initial index, the second rebuildrequest fails.

Required permissions

To get the permissions that you need to alter vector indexes, ask your administrator to grant you the BigQuery Data Editor (roles/bigquery.dataEditor) or BigQuery Data Owner (roles/bigquery.dataOwner) IAM role on your table. For more information about granting roles, seeManage access to projects, folders, and organizations.

You might also be able to get the required permissions throughcustom roles or otherpredefined roles.

Examples

The following example rebuilds theindex1 vector index on thesales table:

ALTERVECTORINDEXIFEXISTSindex1ONmydataset.salesREBUILD;

ALTER DATA_POLICY statement

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To provide feedback or request support for this feature, send an email tobigquery-security@google.com.

Syntax

ALTERDATA_POLICY[IFEXISTS]`project_id.region-location_id.data_policy_id`SETOPTIONS(alter_option_list);

Arguments

  • IF EXISTS: If no data policy exists with that name, the statement hasno effect.

  • project_id: The project ID of the project where the data policy will residein.

  • location_id: The location of the data policy.

  • data_policy_id: The name of the data policy to be updated.

  • alter_option_list: The list of options to update on the data policy.

alter_option_list

The option list specifies options for the data policy. Specify the options inthe following format:NAME=VALUE, ...

The following options are supported:

NAMEVALUEDetails
data_policy_typeSTRINGSet it toDATA_MASKING_POLICY.
masking_expressionSTRINGSpecifies thepredefined masking rule or acustom masking routine.

Required permissions

The user or service account that updates a data policy must have thebigquery.dataPolicies.update permission.

Thebigquery.dataPolicies.update permission is included in theBigQuery Data Policy Admin, BigQuery Admin and BigQuery Data Owner roles.

DROP SCHEMA statement

Deletes a dataset.

Syntax

DROP[EXTERNAL]SCHEMA[IFEXISTS][project_name.]dataset_name[CASCADE|RESTRICT]

Arguments

  • EXTERNAL: Specifies if that dataset is a federated dataset. TheDROP EXTERNAL statement only removes the external definition fromBigQuery. The data stored in the external location is notaffected.

  • IF EXISTS: If no dataset exists with that name, the statement has no effect.

  • project_name: The name of the project that contains the dataset. Defaultsto the project that runs this DDL statement.

  • dataset_name: The name of the dataset to delete.

  • CASCADE: Deletes the dataset and all resources within the dataset, such astables, views, and functions. You must have permission to delete theresources, or else the statement returns an error. For a list ofBigQuery permissions, seePredefined roles and permissions.

  • RESTRICT: Deletes the dataset only if it's empty. Otherwise, returns anerror. If you don't specify eitherCASCADE orRESTRICT, then the defaultbehavior isRESTRICT.

Details

The statement runs in the location of the dataset if it exists, unless youspecify the location in the query settings. For more information, seeSpecifying your location.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.datasets.deleteThe dataset to delete.
bigquery.tables.deleteThe dataset to delete. If the dataset is empty,then this permission is not required.

Examples

The following example deletes the dataset namedmydataset. If the dataset doesnot exist or is not empty, then the statement returns an error.

DROPSCHEMAmydataset

The following example drops the dataset namedmydataset and any resourcesin that dataset. If the dataset does not exist, then no error is returned.

DROPSCHEMAIFEXISTSmydatasetCASCADE

UNDROP SCHEMA statement

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Undeletes a datasetwithin your time travel window.

Syntax

UNDROPSCHEMA[IFNOTEXISTS][project_name.]dataset_name[OPTIONS(location="us")]

Arguments

  • IF NOT EXISTS: If a dataset already exists with that name, the statement hasno effect.

  • project_name: The name of the project that contained the deleted dataset.Defaults to the project that runs this DDL statement.

  • dataset_name: The name of the dataset to undelete.

  • location: Original location of the deleted dataset.

Details

When you run this statement, you mustspecify the locationwhere the dataset was deleted. If you don't, theUS multi-region is used.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.datasets.createThe project where you are undeleting thedataset.
bigquery.datasets.getThe dataset that you are undeleting.

Examples

The following example undeletes the dataset namedmydataset. If the datasetalready exists or has passed the time travel window, then the statement returnsan error.

UNDROPSCHEMAmydataset;

DROP TABLE statement

Deletes a table ortable clone.

Syntax

DROPTABLE[IFEXISTS]table_name

Arguments

  • IF EXISTS: If no table exists with that name, the statement has no effect.

  • table_name: The name of the table to delete. SeeTable path syntax.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.deleteThe table to delete.
bigquery.tables.getThe table to delete.

Examples

Deleting a table

The following example deletes a table namedmytable in themydataset:

DROPTABLEmydataset.mytable

If the table name does not exist in the dataset, the following error isreturned:

Error: Not found: Table myproject:mydataset.mytable

Deleting a table only if the table exists

The following example deletes a table namedmytable inmydataset only ifthe table exists. If the table name does not exist in the dataset, no error isreturned, and no action is taken.

DROPTABLEIFEXISTSmydataset.mytable

DROP SNAPSHOT TABLE statement

Deletes atable snapshot.

Syntax

DROPSNAPSHOTTABLE[IFEXISTS]table_snapshot_name

Arguments

  • IF EXISTS: If no table snapshot exists with that name, then the statement has noeffect.

  • table_snapshot_name: The name of the table snapshot to delete. SeeTable path syntax.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.deleteSnapshotThe table snapshot to delete.

Examples

Delete a table snapshot: fail if it doesn't exist

The following example deletes the table snapshot namedmytablesnapshot in themydataset dataset:

DROPSNAPSHOTTABLEmydataset.mytablesnapshot

If the table snapshot does not exist in the dataset, then the following error isreturned:

Error: Not found: Table snapshot myproject:mydataset.mytablesnapshot

Delete a table snapshot: ignore if it doesn't exist

The following example deletes the table snapshot namedmytablesnapshot in themydataset dataset.

DROPSNAPSHOTTABLEIFEXISTSmydataset.mytablesnapshot

If the table snapshot doesn't exist in the dataset, then no action is taken, andno error is returned.

For information about creating table snapshots, seeCREATE SNAPSHOT TABLE.

For information about restoring table snapshots, seeCREATE TABLE CLONE.

DROP EXTERNAL TABLE statement

Deletes an external table.

Syntax

DROPEXTERNALTABLE[IFEXISTS]table_name

Arguments

  • IF EXISTS: If no external table exists with that name, then the statement has noeffect.

  • table_name: The name of the external table to delete. SeeTable path syntax.

Details

Iftable_name exists but is not an external table, the statement returns the followingerror:

Cannot droptable_name which has typeTYPE. Anexternal table was expected.

TheDROP EXTERNAL statement only removes the external table definition fromBigQuery. The data stored in the external location is notaffected.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.deleteThe external table to delete.
bigquery.tables.getThe external table to delete.

Examples

The following example drops the external table namedexternal_table from thedatasetmydataset. It returns an error if the external table does not exist.

DROPEXTERNALTABLEmydataset.external_table

The following example drops the external table namedexternal_table from thedatasetmydataset. If the external table does not exist, no error is returned.

DROPEXTERNALTABLEIFEXISTSmydataset.external_table

DROP VIEW statement

Deletes a view.

Syntax

DROPVIEW[IFEXISTS]view_name

Arguments

  • IF EXISTS: If no view exists with that name, the statement has no effect.

  • view_name: The name of the view to delete. SeeTable path syntax.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.deleteThe view to delete.
bigquery.tables.getThe view to delete.

Examples

Deleting a view

The following example deletes a view namedmyview inmydataset:

DROPVIEWmydataset.myview

If the view name does not exist in the dataset, the following error is returned:

Error: Not found: Table myproject:mydataset.myview

Deleting a view only if the view exists

The following example deletes a view namedmyview inmydataset only ifthe view exists. If the view name does not exist in the dataset, no error isreturned, and no action is taken.

DROPVIEWIFEXISTSmydataset.myview

DROP MATERIALIZED VIEW statement

Deletes a materialized view.

Syntax

DROPMATERIALIZEDVIEW[IFEXISTS]mv_name

Arguments

  • IF EXISTS: If no materialized view exists with that name, the statement hasno effect.

  • mv_name: The name of the materialized view to delete. SeeTable path syntax.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.deleteThe materialized view to delete.
bigquery.tables.getThe materialized view to delete.

Examples

Deleting a materialized view

The following example deletes a materialized view namedmy_mv inmydataset:

DROPMATERIALIZEDVIEWmydataset.my_mv

If the materialized view name does not exist in the dataset, the following erroris returned:

Error: Not found: Table myproject:mydataset.my_mv

If you are deleting a materialized view in another project, you must specify theproject, dataset, and materialized view in the following format:`project_id.dataset.materialized_view`(including the backticks ifproject_id contains special characters); for example,`myproject.mydataset.my_mv`.

Deleting a materialized view only if it exists

The following example deletes a materialized view namedmy_mv inmydatasetonly if the materialized view exists. If the materialized view name does notexist in the dataset, no error is returned, and no action is taken.

DROPMATERIALIZEDVIEWIFEXISTSmydataset.my_mv

If you are deleting a materialized view in another project, you must specify theproject, dataset, and materialized view in the following format:`project_id.dataset.materialized_view`,(including the backticks ifproject_id contains special characters); for example,`myproject.mydataset.my_mv`.

DROP FUNCTION statement

Deletes a persistent user-defined function (UDF) oruser-defined aggregate function (UDAF).

Syntax

DROPFUNCTION[IFEXISTS][[project_name.]dataset_name.]function_name

Arguments

  • IF EXISTS: If no function exists with that name, the statement has noeffect.

  • project_name: The name of the project containing the function to delete.Defaults to the project that runs this DDL query. If the project namecontains special characters such as colons, it should be quoted in backticks` (example:`google.com:my_project`).

  • dataset_name: The name of the dataset containing the function to delete.Defaults to thedefaultDataset in the request.

  • function_name: The name of the function you're deleting.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.routines.deleteThe function to delete.

Examples

The following example statement deletes the functionparseJsonAsStructcontained in the datasetmydataset.

DROPFUNCTIONmydataset.parseJsonAsStruct;

The following example statement deletes the functionparseJsonAsStruct fromthe datasetsample_dataset in the projectother_project.

DROPFUNCTION`other_project`.sample_dataset.parseJsonAsStruct;

DROP TABLE FUNCTION

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Deletes atable function.

Syntax

DROPTABLEFUNCTION[IFEXISTS][[project_name.]dataset_name.]function_name

Arguments

  • IF EXISTS: If no table function exists with this name, the statementhas no effect.

  • project_name: The name of the project containing the table functionto delete. Defaults to the project that runs this DDL query.

  • dataset_name: The name of the dataset containing the table function todelete.

  • function_name: The name of the table function to delete.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.routines.deleteThe table function to delete.

Example

The following example deletes a table function namedmy_table_function:

DROPTABLEFUNCTIONmydataset.my_table_function;

DROP PROCEDURE statement

Deletes a stored procedure.

Syntax

DROPPROCEDURE[IFEXISTS][[project_name.]dataset_name.]procedure_name

Arguments

  • IF EXISTS: If no procedure exists with that name, the statement has noeffect.

  • project_name: The name of the project containing the procedure to delete.Defaults to the project that runs this DDL query. If the project namecontains special characters such as colons, it should be quoted in backticks` (example:`google.com:my_project`).

  • dataset_name: The name of the dataset containing the procedure to delete.Defaults to thedefaultDataset in the request.

  • procedure_name: The name of the procedure you're deleting.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.routines.deleteThe procedure to delete.

Examples

The following example statement deletes the proceduremyprocedurecontained in the datasetmydataset.

DROPPROCEDUREmydataset.myProcedure;

The following example statement deletes the proceduremyProcedure fromthe datasetsample_dataset in the projectother_project.

DROPPROCEDURE`other-project`.sample_dataset.myprocedure;

DROP ROW ACCESS POLICY statement

Deletes a row-level access policy.

Important: You cannot delete the last row-level access policy from a table usingDROP ROW ACCESS POLICY. Attempting to do so results in an error. Todelete the last row-level access policy on table, you must useDROP ALL ROWACCESS POLICIES instead.

Syntax

DROPROWACCESSPOLICY[IFEXISTS]row_access_policy_nameONtable_name;
DROPALLROWACCESSPOLICIESONtable_name;

Arguments

  • IF EXISTS: If no row-level access policy exists with that name, thestatement has no effect.

  • row_access_policy_name: The name of the row-level access policy that you aredeleting. Each row-level access policy on a table has a unique name.

  • table_name: The name of the table with the row-level access policy orpolicies that you want to delete.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.rowAccessPolicies.deleteThe row-level access policy to delete.
bigquery.rowAccessPolicies.setIamPolicyThe row-level access policy to delete.
bigquery.rowAccessPolicies.listThe table to delete all row-level access policies on. Only required forDROP ALL statements.

Examples

Delete a row-level access policy from a table:

DROPROWACCESSPOLICYmy_row_filterONproject.dataset.my_table;

Delete all the row-level access policies from a table:

DROPALLROWACCESSPOLICIESONproject.dataset.my_table;

DROP CAPACITY statement

Deletes a capacity commitment.

Syntax

DROPCAPACITY[IFEXISTS]project_id.location.capacity-commitment-id

Arguments

  • IF EXISTS: If no capacity commitment exists with that ID, the statement hasno effect.
  • project_id: The project ID of the administration project where thereservation was created.
  • location: Thelocation ofthe commitment.
  • capacity-commitment-id: The capacity commitment ID.

To find the capacity commitment ID, query theINFORMATION_SCHEMA.CAPACITY_COMMITMENTS_BY_PROJECTtable.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.capacityCommitments.deleteThe administration project thatmaintains ownership of thecommitments.

Example

The following example deletes the capacity commitment:

DROPCAPACITY`admin_project.region-us.1234`

DROP RESERVATION statement

Deletes a reservation.

Syntax

DROPRESERVATION[IFEXISTS]project_id.location.reservation_id

Arguments

  • IF EXISTS: If no reservation exists with that ID, the statement has noeffect.
  • project_id: The project ID of the administration project where thereservation was created.
  • location: Thelocation ofthe reservation.
  • reservation_id: The reservation ID.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.reservations.deleteThe administration project that maintainsownership of the commitments.

Example

The following example deletes the reservationprod:

DROPRESERVATION`admin_project.region-us.prod`

DROP ASSIGNMENT statement

Deletes a reservation assignment.

Syntax

DROPASSIGNMENT[IFEXISTS]project_id.location.reservation_id.assignment_id

Arguments

  • IF EXISTS: If no assignment exists with that ID, the statement has noeffect.
  • project_id: The project ID of the administration project where thereservation was created.
  • location: Thelocation ofthe reservation.
  • reservation_id: The reservation ID.
  • assignment_id: The assignment ID.

To find the assignment ID, query theINFORMATION_SCHEMA.ASSIGNMENTS view.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.reservationAssignments.deleteThe administration project andthe assignee.

Example

The following example deletes an assignment from the reservation namedprod:

DROPASSIGNMENT`admin_project.region-us.prod.1234`

DROP SEARCH INDEX statement

Deletes a search index on a table.

Syntax

DROPSEARCHINDEX[IFEXISTS]index_nameONtable_name

Arguments

  • IF EXISTS: If no search index exists with that name on the table, thestatement has no effect.
  • index_name: The name of the search index to be deleted.
  • table_name: The name of the table with the index.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.deleteIndexThe table with the search index to delete.

Example

The following example deletes a search indexmy_index frommy_table:

DROPSEARCHINDEXmy_indexONdataset.my_table;

DROP VECTOR INDEX statement

Deletes a vector index on a table.

Syntax

DROPVECTORINDEX[IFEXISTS]index_nameONtable_name

Arguments

  • IF EXISTS: If no vector index exists with that name on the table, thestatement has no effect.
  • index_name: The name of the vector index to be deleted.
  • table_name: The name of the table with the vector index.

Required permissions

This statement requires the followingIAM permissions:

PermissionResource
bigquery.tables.deleteIndexThe table with the vector index to delete.

Example

The following example deletes a vector indexmy_index frommy_table:

DROPVECTORINDEXmy_indexONdataset.my_table;

DROP DATA_POLICY statement

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To provide feedback or request support for this feature, send an email tobigquery-security@google.com.

Deletes a data policy in a project.

Syntax

DROPDATA_POLICY[IFEXISTS]`myproject.region-us.data_policy_name`;

Arguments

  • IF EXISTS: If no data policy exists with that name, the statement hasno effect.

  • project_id: The project ID of the project where the data policy will residein.

  • location_id: The location of the data policy.

  • data_policy_id: The name of the data policy to be deleted.

Required permissions

The user or service account that creates a data policy must have thebigquery.dataPolicies.delete permission. This permission is included in theBigQuery Data Policy Admin, BigQuery Admin and BigQuery Data Owner roles.

Table path syntax

Use the following syntax when specifying the path of atable resource,including standard tables, views, materialized views, external tables, andtable snapshots.

table_path:=[[project_name.]dataset_name.]table_name
  • project_name: The name of the project that contains the table resource.Defaults to the project that runs the DDL query. If the project name containsspecial characters such as colons, quote the name in backticks` (example:`google.com:my_project`).

  • dataset_name: The name of the dataset that contains the table resource.Defaults to thedefaultDataset in the request.

  • table_name: The name of the table resource.

When you create a table in BigQuery, the table name mustbe unique per dataset. The table name can:

  • Contain characters with a total of up to 1,024 UTF-8 bytes.
  • Contain Unicode characters in category L (letter), M (mark), N (number),Pc (connector, including underscore), Pd (dash), Zs (space). For moreinformation, seeGeneral Category.

The following are all examples of valid table names:table 01,ग्राहक,00_お客様,étudiant-01.

Caveats:

  • Table names are case-sensitive by default.mytable andMyTable cancoexist in the same dataset, unless they are part of adataset withcase-sensitivity turned off.
  • Some table names and table name prefixes are reserved. Ifyou receive an error saying that your table name or prefix isreserved, then select a different name and try again.
  • If you include multiple dot operators (.) in a sequence, the duplicateoperators are implicitly stripped.

    For example, this:project_name....dataset_name..table_name

    Becomes this:project_name.dataset_name.table_name

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.