Load statements in GoogleSQL

`LOAD DATA` statement

Loads data from one or more files into a table. The statement can create a newtable, append data into an existing table or partition, or overwrite an existingtable or partition. If theLOAD DATA statement fails, the table into which youare loading data remains unchanged.

Syntax

LOAD DATA {OVERWRITE|INTO}  [{TEMP|TEMPORARY} TABLE][[project_name.]dataset_name.]table_name[(  column_list)][[OVERWRITE] PARTITIONS (partition_column_name=partition_value)][PARTITION BYpartition_expression][CLUSTER BY clustering_column_list][OPTIONS (table_option_list)]FROM FILES(load_option_list)[WITH PARTITION COLUMNS  [(partition_column_list)]][WITH CONNECTION connection_name]column_list:column[, ...]partition_column_list: partition_column_name, partition_column_type[, ...]

Arguments

INTO: If a table with this name already exists, the statement appends datato the table. You must useINTO instead ofOVERWRITE if your statementincludes thePARTITIONS clause.
OVERWRITE: If a table with this name already exists, the statementoverwrites the table.
{TEMP|TEMPORARY} TABLE: Use this clause to create or write to a temporarytable.
project_name: The name of the project for the table. The valuedefaults to the project that runs this DDL query.
dataset_name: The name of the dataset for the table.
table_name: The name of the table.
column_list: Contains the table's schema information as a list of tablecolumns. For more information about table schemas, seeSpecifying a schema. If you don't specify a schema,BigQuery usesschema auto-detection to infer the schema.
When you load hive-partitioned data into a new table or overwrite an existingtable, then that tableschema contains the hive-partitioned columns and the columns in thecolumn_list.
If you append hive-partitioned data to an existing table, then thehive-partitioned columns andcolumn_list can be a subset of the existingcolumns. If the combined list of columns in not a subset of the existingcolumns, then the following rules apply:
- If your data is self-describing, such as ORC, PARQUET, or AVRO, thencolumns in the source file that are omitted from thecolumn_listare ignored. Columns in thecolumn_list that don't exist in the sourcefile are written withNULL values. If a column is in thecolumn_listand the source file, then their types must match.
- If your data is not self-describing, such as CSV or JSON, then columnsin the source file that are omitted from thecolumn_list are onlyignored if you setignore_unknown_values toTRUE. Otherwise thisstatement returns an error. You can't listcolumns in thecolumn_list that don't exist in the source file.
[OVERWRITE] PARTITIONS: Use this clause to write to oroverwrite exactly one partition. When you use this clause, the statementmust begin withLOAD DATA INTO.
partition_column_name: The name of the partitioned column to write to.If you use both thePARTITIONS and thePARTITION BY clauses, then thecolumn names must match.
partition_value: Thepartition_id of the partition to append oroverwrite.To find thepartition_id values of a table, query theINFORMATION_SCHEMA.PARTITIONS view.You can't set thepartition_value to__NULL__ or__UNPARTITIONED__.You can only append to or overwrite one partition. If your data containsvalues that belong to multiple partitions, then the statementfails with an error. Thispartition_value must be literal value.
partition_expression: Specifies the table partitioning when creating anew table.
clustering_column_list: Specifies table clustering when creating a newtable. The value is a comma-separated list of column names, with up to fourcolumns.
table_option_list: Specifies options for creatingthe table. If you include this clause and the table already exists, then theoptions must match the existing table specification.
partition_column_list: A list of external partitioning columns.
connection_name: The connection name that is used to read the sourcefiles from anexternal data source.
load_option_list: Specifies options for loading thedata.

If no table exists with the specified name, then the statement creates a newtable. If a table already exists with the specified name, then the behaviordepends on theINTO orOVERWRITE keyword. TheINTO keyword appends thedata to the table, and theOVERWRITE keyword overwrites the table.

If your external data uses ahive-partitioned layout,then include theWITH PARTITION COLUMNS clause. If you include theWITHPARTITION COLUMNS clause withoutpartition_column_list, thenBigQuery infers the partitioning from the data layout. If youinclude bothcolumn_list andWITH PARTITION COLUMNS, thenpartition_column_list is required.

You can't use theLOAD DATA statement to load data into a temporary table.

`column`

(column_name column_schema[, ...]) contains the table'sschema information in a comma-separated list.

Note: Constraints cannot be specified onARRAY orSTRUCT elements.

column:=column_namecolumn_schemacolumn_schema:={simple_type|STRUCT<field_list>|ARRAY<array_element_schema>}[PRIMARYKEYNOTENFORCED|REFERENCEStable_name(column_name)NOTENFORCED][DEFAULTdefault_expression|GENERATEDALWAYSAS(generation_expression)STOREDOPTIONS(generation_option_list)][NOTNULL][OPTIONS(column_option_list)]simple_type:={data_type|STRINGCOLLATEcollate_specification}field_list:=field_namecolumn_schema[,...]array_element_schema:={simple_type|STRUCT<field_list>}[NOTNULL]

column_name is the name of the column.A column name:
- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
- Must start with a letter or underscore
- Can be up to 300 characters
column_schema: Similar to adata type, but supports anoptionalNOT NULL constraint for types other thanARRAY.column_schemaalso supports options on top-level columns andSTRUCT fields.
column_schema can be used only in the column definition list ofCREATE TABLE statements. It cannot be used as a type in expressions.
simple_type: Anysupported data typeaside fromSTRUCT andARRAY.
Ifsimple_type is aSTRING, it supports an additional clause forcollation,which defines how a resultingSTRING can be compared and sorted.The syntax looks like this:
```
STRINGCOLLATEcollate_specification
```
If you haveDEFAULT COLLATE collate_specification assigned to the table,the collation specification for a column overrides the specification for thetable.
default_expression: Thedefault valueassigned to the column. You cannot specifyDEFAULT ifGENERATED ALWAYS ASis specified.
generation_expression: (Preview)An expression for an automatically generated embedding column.Setting this field enablesautonomous embedding generationon the table. The onlysupportedgeneration_expression syntax is a call to theAI.EMBED function.
- You can't specifyGENERATED ALWAYS AS ifDEFAULT is specified.
- Theconnection_id argument toAI.EMBED is required when used in ageneration expression.
- The type of the column must beSTRUCT<result ARRAY<FLOAT64>, status STRING>.
generation_option_list: The options for a generated column.The only supported option isasynchronous = TRUE.
field_list: Represents the fields in a struct.
field_name: The name of the struct field. Struct field names have thesame restrictions as column names.
NOT NULL: When theNOT NULL constraint is present for a column or field,the column or field is created withREQUIRED mode. Conversely, when theNOT NULL constraint is absent, the column or field is created withNULLABLE mode.
Columns and fields ofARRAY type do not support theNOT NULL modifier. Forexample, acolumn_schema ofARRAY<INT64> NOT NULL is invalid, sinceARRAY columns haveREPEATED mode and can be empty but cannot beNULL.An array element in a table can never beNULL, regardless of whether theNOT NULL constraint is specified. For example,ARRAY<INT64> is equivalenttoARRAY<INT64 NOT NULL>.
TheNOT NULL attribute of a table'scolumn_schema does not propagatethrough queries over the table. If tableT contains a column declared asx INT64 NOT NULL, for example,CREATE TABLE dataset.newtable AS SELECT x FROM T creates a table nameddataset.newtable in whichx isNULLABLE.

`column_option_list`

Specify a column option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

NAME VALUE Details

`NAME`	`VALUE`	Details
`description`	`STRING`	Example:`description="a unique id"` This property is equivalent to theschema.fields[].description table resource property.
`rounding_mode`	`STRING`	Example:`rounding_mode = "ROUND_HALF_EVEN"` This specifies therounding mode that's used for values written to a`NUMERIC` or`BIGNUMERIC` type column or`STRUCT` field. The following values are supported: `"ROUND_HALF_AWAY_FROM_ZERO"`: Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3. `"ROUND_HALF_EVEN"`: Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2. This property is equivalent to the`roundingMode` table resource property.
`data_policies`	`ARRAY<STRING>`	Applies adata policy to a column in a table (Preview). Example:`data_policies = ["{'name':'myproject.region-us.data_policy_name1'}", "{'name':'myproject.region-us.data_policy_name2'}"]` The`ALTER TABLE ALTER COLUMN` statement supports the`=` and`+=` operators to add data policies to a specific column. Example:`data_policies +=["data_policy1", "data_policy2"]`

description

STRING

Example:description="a unique id"

This property is equivalent to theschema.fields[].description table resource property.

rounding_mode

STRING

Example:rounding_mode = "ROUND_HALF_EVEN"

This specifies therounding mode that's used for values written to aNUMERIC orBIGNUMERIC type column orSTRUCT field. The following values are supported:

"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.

This property is equivalent to theroundingMode table resource property.

data_policies

ARRAY<STRING>

Applies adata policy to a column in a table (Preview).

Example:data_policies = ["{'name':'myproject.region-us.data_policy_name1'}", "{'name':'myproject.region-us.data_policy_name2'}"]

TheALTER TABLE ALTER COLUMN statement supports the= and+= operators to add data policies to a specific column.

Example:data_policies +=["data_policy1", "data_policy2"]

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

A reference to a table
Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
- ARRAY_TO_STRING
- REPLACE
- REGEXP_REPLACE
- RAND
- FORMAT
- LPAD
- RPAD
- REPEAT
- SESSION_USER
- GENERATE_ARRAY
- GENERATE_DATE_ARRAY

Setting theVALUE replaces the existing value of that option for the column, ifthere was one. Setting theVALUE toNULL clears the column's value for thatoption.

`partition_expression`

PARTITION BY is an optional clause that controlstable andvector index partitioning.partition_expression is an expression that determines how to partition thetable or vector index. The partition expression can contain the followingvalues:

_PARTITIONDATE. Partition by ingestion time with daily partitions. Thissyntax cannot be used with theAS query_statement clause.
DATE(_PARTITIONTIME). Equivalent to_PARTITIONDATE. This syntax cannot beused with theAS query_statement clause.
<date_column>. Partition by aDATE column with daily partitions.
DATE({ <timestamp_column> | <datetime_column> }). Partition by aTIMESTAMPorDATETIME column with daily partitions.
DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR }). Partitionby aDATETIME column with the specified partitioning type.
TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }).Partition by aTIMESTAMP column with the specified partitioning type.
TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR }). Partitionby ingestion time with the specified partitioning type. This syntax cannot beused with theAS query_statement clause.
DATE_TRUNC(<date_column>, { MONTH | YEAR }). Partition by aDATE columnwith the specified partitioning type.
RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>])).Partition by an integer column with the specified range, where:
- start is the start of range partitioning, inclusive.
- end is the end of range partitioning, exclusive.
- interval is the width of each range within the partition. Defaults to 1.

`table_option_list`

The option list lets you set table options such as alabel and an expiration time. You can include multipleoptions using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME andVALUE must be one of the following combinations:

`NAME`	`VALUE`	Details
`expiration_timestamp`	`TIMESTAMP`	Example:`expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"` This property is equivalent to theexpirationTime table resource property.
`partition_expiration_days`	`FLOAT64`	Example:`partition_expiration_days=7` Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions don't expire. This property is equivalent to thetimePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours. This property can only be set if the table is partitioned.
`require_partition_filter`	`BOOL`	Example:`require_partition_filter=true` Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value is`false`. This property is equivalent to thetimePartitioning.requirePartitionFilter table resource property. This property can only be set if the table is partitioned.
`friendly_name`	`STRING`	Example:`friendly_name="my_table"` This property is equivalent to thefriendlyName table resource property.
`description`	`STRING`	Example:`description="a table that expires in 2025"` This property is equivalent to thedescription table resource property.
`labels`	`ARRAY<STRUCT<STRING, STRING>>`	Example:`labels=[("org_unit", "development")]` This property is equivalent to thelabels table resource property.
`default_rounding_mode`	`STRING`	Example:`default_rounding_mode = "ROUND_HALF_EVEN"` This specifies the defaultrounding mode that's used for values written to any new`NUMERIC` or`BIGNUMERIC` type columns or`STRUCT` fields in the table. It does not impact existing fields in the table. The following values are supported: `"ROUND_HALF_AWAY_FROM_ZERO"`: Halfway cases are rounded away from zero. For example, 2.5 is rounded to 3.0, and -2.5 is rounded to -3. `"ROUND_HALF_EVEN"`: Halfway cases are rounded towards the nearest even digit. For example, 2.5 is rounded to 2.0 and -2.5 is rounded to -2.0. This property is equivalent to the`defaultRoundingMode` table resource property.
`enable_change_history`	`BOOL`	Inpreview. Example:`enable_change_history=TRUE` Set this property to`TRUE` in order to capturechange history on the table, which you can then view by using the`CHANGES` function. Enabling this table option has an impact on costs; for more information seePricing and costs. The default is`FALSE`.
`max_staleness`	`INTERVAL`	Example:`max_staleness=INTERVAL "4:0:0" HOUR TO SECOND` The maximum interval behind the current time where it's acceptable to read stale data. For example, withchange data capture, when this option is set, the table copy operation is denied if data is more stale than the`max_staleness` value. `max_staleness` is disabled by default.
`enable_fine_grained_mutations`	`BOOL`	Inpreview. Example:`enable_fine_grained_mutations=TRUE` Set this property to`TRUE` to enable fine-grained DML optimization on the table. The default is`FALSE`.
`storage_uri`	`STRING`	Inpreview. Example:`storage_uri=gs://BUCKET_DIRECTORY/TABLE_DIRECTORY/` A fully qualified location prefix for the external folder where data is stored. Supports`gs:` buckets. Required formanaged tables.
`file_format`	`STRING`	Inpreview. Example:`file_format=PARQUET` The open-source file format in which the table data is stored. Only`PARQUET` is supported. Required formanaged tables. The default is`PARQUET`.
`table_format`	`STRING`	Inpreview. Example:`table_format=ICEBERG` The open table format in which metadata-only snapshots are stored. Only`ICEBERG` is supported. Required formanaged tables. The default is`ICEBERG`.
`tags`	`<ARRAY<STRUCT<STRING, STRING>>>`	An array of IAM tags for the table, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.

VALUE is a constant expression containing only literals, query parameters,and scalar functions.

The constant expressioncannot contain:

A reference to a table
Subqueries or SQL statements such asSELECT,CREATE, orUPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
- ARRAY_TO_STRING
- REPLACE
- REGEXP_REPLACE
- RAND
- FORMAT
- LPAD
- RPAD
- REPEAT
- SESSION_USER
- GENERATE_ARRAY
- GENERATE_DATE_ARRAY

`load_option_list`

Specifies options for loading data from external files. Theformat andurisoptions are required. Specify the option list in the following format:NAME=VALUE, ...

Options
`allow_jagged_rows`	`BOOL` If`true`, allow rows that are missing trailing optional columns. Applies to CSV data.
`allow_quoted_newlines`	`BOOL` If`true`, allow quoted data sections that contain newline characters in the file. Applies to CSV data.
`bigtable_options`	`STRING` Only required when creating a Bigtable external table. Specifies the schema of the Bigtable external table in JSON format. For a list of Bigtable table definition options, see`BigtableOptions` in the REST API reference.
`column_name_character_map`	`STRING` Defines the scope of supported column name characters and the handling behavior of unsupported characters. The default setting is`STRICT`, which means unsupported characters cause BigQuery to throw errors.`V1` and`V2` replace any unsupported characters with underscores. Supported values include: `STRICT`. Enablesflexible column names. This is the default value. Load jobs with unsupported characters in column names fail with an error message. To configure the replacement of unsupported characters with underscores so that the load job succeeds, specify the`default_column_name_character_map` configuration setting. `V1`. Column names can only containstandard column name characters. Unsupported characters (exceptperiods in Parquet file column names) are replaced with underscores. This is the default behavior for tables created before the introduction of`column_name_character_map`. `V2`. Besidesstandard column name characters, it also supportsflexible column names. Unsupported characters (exceptperiods in Parquet file column names) are replaced with underscores. Applies to CSV and Parquet data.
`compression`	`STRING` The compression type of the data source. Supported values include:`GZIP`. If not specified, the data source is uncompressed. Applies to CSV and JSON data.
`decimal_target_types`	`ARRAY<STRING>` Determines how to convert a`Decimal` type. Equivalent toExternalDataConfiguration.decimal_target_types Example:`["NUMERIC", "BIGNUMERIC"]`.
`enable_list_inference`	`BOOL` If`true`, use schema inference specifically for Parquet LIST logical type. Applies to Parquet data.
`enable_logical_types`	`BOOL` If`true`, convert Avro logical types into their corresponding SQL types. For more information, see Logical types. Applies to Avro data.
`encoding`	`STRING` The character encoding of the data. Supported values include:`UTF8` (or`UTF-8`),`ISO_8859_1` (or`ISO-8859-1`),`UTF-16BE`,`UTF-16LE`,`UTF-32BE`, or`UTF-32LE`. The default value is`UTF-8`. Applies to CSV data.
`enum_as_string`	`BOOL` If`true`, infer Parquet ENUM logical type as STRING instead of BYTES by default. Applies to Parquet data.
`field_delimiter`	`STRING` The separator for fields in a CSV file. Applies to CSV data.
`format`	`STRING` The format of the external data. Supported values for`CREATE EXTERNAL TABLE` include:`AVRO`,`CLOUD_BIGTABLE`,`CSV`,`DATASTORE_BACKUP`,`DELTA_LAKE` (preview),`GOOGLE_SHEETS`,`NEWLINE_DELIMITED_JSON` (or`JSON`),`ORC`,`PARQUET`. Supported values for`LOAD DATA` include:`AVRO`,`CSV`,`DELTA_LAKE` (preview)`NEWLINE_DELIMITED_JSON` (or`JSON`),`ORC`,`PARQUET`. The value`JSON` is equivalent to`NEWLINE_DELIMITED_JSON`.
`hive_partition_uri_prefix`	`STRING` A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables. Applies to Avro, CSV, JSON, Parquet, and ORC data. Example:`"gs://bucket/path"`.
`file_set_spec_type`	`STRING` Specifies how to interpret source URIs for load jobs and external tables. Supported values include: `FILE_SYSTEM_MATCH`. Expands source URIs by listing files from the object store. This is the default behavior if FileSetSpecType is not set. `NEW_LINE_DELIMITED_MANIFEST`. Indicates that the provided URIs are newline-delimited manifest files, with one URI per line. Wildcard URIs are not supported in the manifest files, and all referenced data files must be in the same bucket as the manifest file. For example, if you have a source URI of`"gs://bucket/path/file"` and the`file_set_spec_type` is`FILE_SYSTEM_MATCH`, then the file is used directly as a data file. If the`file_set_spec_type` is`NEW_LINE_DELIMITED_MANIFEST`, then each line in the file is interpreted as a URI that points to a data file.
`ignore_unknown_values`	`BOOL` If`true`, ignore extra values that are not represented in the table schema, without returning an error. Applies to CSV and JSON data.
`json_extension`	`STRING` For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records. Supported values include: `GEOJSON`. Newline-delimited GeoJSON data. For more information, seeCreating an external table from a newline-delimited GeoJSON file.
`max_bad_records`	`INT64` The maximum number of bad records to ignore when reading the data. Applies to: CSV, JSON, and Google Sheets data.
`max_staleness`	`INTERVAL` Applicable forBigLake tables andobject tables. Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it. To disable metadata caching, specify 0. This is the default. To enable metadata caching, specify aninterval literal value between 30 minutes and 7 days. For example, specify`INTERVAL 4 HOUR` for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation falls back to retrieving metadata from Cloud Storage instead.
`null_marker`	`STRING` The string that represents`NULL` values in a CSV file. Applies to CSV data.
`null_markers`	`ARRAY<STRING>` (Preview) The list of strings that represent`NULL` values in a CSV file. This option cannot be used with`null_marker` option. Applies to CSV data.
`object_metadata`	`STRING` Only required when creating anobject table. Set the value of this option to`SIMPLE` when creating an object table.
`preserve_ascii_control_characters`	`BOOL` If`true`, then the embedded ASCII control characters which are the first 32 characters in the ASCII table, ranging from '\x00' to '\x1F', are preserved. Applies to CSV data.
`quote`	`STRING` The string used to quote data sections in a CSV file. If your data contains quoted newline characters, also set the`allow_quoted_newlines` property to`true`. Applies to CSV data.
`skip_leading_rows`	`INT64` The number of rows at the top of a file to skip when reading the data. Applies to CSV and Google Sheets data.
`source_column_match`	`STRING` (Preview) This controls the strategy used to match loaded columns to the schema. If this value is unspecified, then the default is based on how the schema is provided. If autodetect is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position. This is done to keep the behavior backward-compatible. Supported values include: `POSITION`: matches by position. This option assumes that the columns are ordered the same way as the schema. `NAME`: matches by name. This option reads the header row as column names and reorders columns to match the field names in the schema. Column names are read from the last skipped row based on the`skip_leading_rows` property.
`tags`	`<ARRAY<STRUCT<STRING, STRING>>>` An array of IAM tags for the table, expressed as key-value pairs. The key should be thenamespaced key name, and the value should be theshort name.
`time_zone`	`STRING` (Preview) Default time zone that will apply when parsing timestamp values that have no specific time zone. Checkvalid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. Applies to CSV and JSON data.
`date_format`	`STRING` (Preview) Format elements that define how the DATE values are formatted in the input files (for example,`MM/DD/YYYY`). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats. Applies to CSV and JSON data.
`datetime_format`	`STRING` (Preview) Format elements that define how the DATETIME values are formatted in the input files (for example,`MM/DD/YYYY HH24:MI:SS.FF3`). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats. Applies to CSV and JSON data.
`time_format`	`STRING` (Preview) Format elements that define how the TIME values are formatted in the input files (for example,`HH24:MI:SS.FF3`). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats. Applies to CSV and JSON data.
`timestamp_format`	`STRING` (Preview) Format elements that define how the TIMESTAMP values are formatted in the input files (for example,`MM/DD/YYYY HH24:MI:SS.FF3`). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats. Applies to CSV and JSON data.
`uris`	For external tables, including object tables, that aren't Bigtable tables: `ARRAY<STRING>` An array of fully qualified URIs for the external data locations. Each URI can contain one asterisk (``)wildcard character, which must come after the bucket name. When you specify`uris` values that target multiple files, all of those files must share a compatible schema. The following examples show valid`uris` values: `['gs://bucket/path1/myfile.csv']` `['gs://bucket/path1/.csv']` `['gs://bucket/path1/', 'gs://bucket/path2/file00']` For Bigtable tables: `STRING` The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI. Example:`https://googleapis.com/bigtable/projects/project_id/instances/instance_id[/appProfiles/app_profile]/tables/table_name` For more information on constructing a Bigtable URI, seeRetrieve the Bigtable URI.

Examples

The following examples show common use cases for theLOAD DATA statement.

Load data into a table

The following example loads an Avro file into a table. Avro is aself-describing format, so BigQuery infers the schema.

LOADDATAINTOmydataset.table1FROMFILES(format='AVRO',uris=['gs://bucket/path/file.avro'])

The following example loads two CSV files into a table, using schemaautodetection.

LOADDATAINTOmydataset.table1FROMFILES(format='CSV',uris=['gs://bucket/path/file1.csv','gs://bucket/path/file2.csv'])

Load data using a schema

The following example loads a CSV file into a table, using a specified tableschema.

LOADDATAINTOmydataset.table1(xINT64,ySTRING)FROMFILES(skip_leading_rows=1,format='CSV',uris=['gs://bucket/path/file.csv'])

Set options when creating a new table

The following example creates a new table with a description and an expirationtime.

LOADDATAINTOmydataset.table1OPTIONS(description="my table",expiration_timestamp="2025-01-01 00:00:00 UTC")FROMFILES(format='AVRO',uris=['gs://bucket/path/file.avro'])

Overwrite an existing table

The following example overwrites an existing table.

LOADDATAOVERWRITEmydataset.table1FROMFILES(format='AVRO',uris=['gs://bucket/path/file.avro'])

Load data into a temporary table

The following example loads an Avro file into a temporary table.

LOADDATAINTOTEMPTABLEmydataset.table1FROMFILES(format='AVRO',uris=['gs://bucket/path/file.avro'])

Specify table partitioning and clustering

The following example creates a table that is partitioned by thetransaction_date field and clustered by thecustomer_id field. It alsoconfigures the partitions to expire after three days.

LOADDATAINTOmydataset.table1PARTITIONBYtransaction_dateCLUSTERBYcustomer_idOPTIONS(partition_expiration_days=3)FROMFILES(format='AVRO',uris=['gs://bucket/path/file.avro'])

Load data into a partition

The following example loads data into a selected partition of an ingestion-timepartitioned table:

LOADDATAINTOmydataset.table1PARTITIONS(_PARTITIONTIME=TIMESTAMP'2016-01-01')PARTITIONBY_PARTITIONTIMEFROMFILES(format='AVRO',uris=['gs://bucket/path/file.avro'])

Load a file that is externally partitioned

The following example loads a set of external files that use ahive partitioninglayout.

LOADDATAINTOmydataset.table1FROMFILES(format='AVRO',uris=['gs://bucket/path/*'],hive_partition_uri_prefix='gs://bucket/path')WITHPARTITIONCOLUMNS(field_1STRING,-- column order must match the external pathfield_2INT64)

The following example infers the partitioning layout:

LOADDATAINTOmydataset.table1FROMFILES(format='AVRO',uris=['gs://bucket/path/*'],hive_partition_uri_prefix='gs://bucket/path')WITHPARTITIONCOLUMNS

If you include bothcolumn_list andWITH PARTITION COLUMNS, then you mustexplicitly list the partitioning columns. For example, the following queryreturns an error:

-- This query returns an error.LOADDATAINTOmydataset.table1(xINT64,-- column_list is given but the partition column list is missingySTRING)FROMFILES(format='AVRO',uris=['gs://bucket/path/*'],hive_partition_uri_prefix='gs://bucket/path')WITHPARTITIONCOLUMNS

Load data with cross-cloud transfer

Example 1

The following example loads a parquet file namedsample.parquet from an Amazon S3bucket into thetest_parquet table with an auto-detect schema:

LOADDATAINTOmydataset.testparquetFROMFILES(uris=['s3://test-bucket/sample.parquet'],format='PARQUET')WITHCONNECTION`aws-us-east-1.test-connection`

Example 2

The following example loads a CSV file with the prefixsampled* from yourBlob Storage into thetest_csv table with predefined column partitioning by time:

LOADDATAINTOmydataset.test_csv(NumberINT64,NameSTRING,TimeDATE)PARTITIONBYTimeFROMFILES(format='CSV',uris=['azure://test.blob.core.windows.net/container/sampled*'],skip_leading_rows=1)WITHCONNECTION`azure-eastus2.test-connection`

Example 3

The following example overwrites the existing tabletest_parquet withdata from a file namedsample.parquet with an auto-detect schema:

LOADDATAOVERWRITEmydataset.testparquetFROMFILES(uris=['s3://test-bucket/sample.parquet'],format='PARQUET')WITHCONNECTION`aws-us-east-1.test-connection`

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換