Loading externally partitioned data

BigQuery can load data that is stored in Cloud Storage using a Hivepartitioning layout.Hive partitioning means that the external data isorganized into multiple files, with a naming convention to separate files intodifferent partitions. For more information, seeSupported data layouts.

By default, the data is not partitioned in BigQuery afteryou load it, unless you explicitly create apartitioned table.

Load Hive partitioned data

To load Hive partitioned data, choose one of the following options:

Console

In the Google Cloud console, go toBigQuery.
Go to BigQuery
In the left pane, clickExplorer.
In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
ClickActions, and then clickCreate table. This opens theCreate table pane.
In theSource section, specify the following details:

ForCreate table from, selectGoogle Cloud Storage.
ForSelect file from Cloud Storage bucket, enter the path to the Cloud Storage folder, usingwildcards. For example,my_bucket/my_files*. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
From theFile format list, select the file type.
Select theSource data partitioning checkbox, and then forSelect Source URI Prefix, enter the Cloud Storage URI prefix. For example,gs://my_bucket/my_files.
In thePartition inference mode section, select one of the following options:
- Automatically infer types: set the partition schema detection mode toAUTO.
- All columns are strings: set the partition schema detection mode toSTRINGS.
- Provide my own: set the partition schema detection mode toCUSTOM and manually enter the schema information for the partition keys. For more information, seeProvide a custom partition key schema.
Optional: To require a partition filter on all queries for this table, select theRequire partition filter checkbox. Requiring a partition filter can reduce cost and improve performance. For more information, seeRequiring predicate filters on partition keys in queries.

In theDestination section, specify the following details:
1. ForProject, select the project in which you want to create the table.
2. ForDataset, select the dataset in which you want to create the table.
3. ForTable, enter the name of the table that you want to create.
4. ForTable type, selectNative table .
In theSchema section, enter theschema definition.
To enable theauto detection of schema, selectAuto detect.
To ignore rows with extra column values that do not match the schema, expand theAdvanced options section and selectUnknown values.
ClickCreate table.

SQL

To create an externally partitioned table, use theWITH PARTITION COLUMNS clause of theLOAD DATA statement to specify the partition schema details.

For an example, seeLoad a file that is externally partitioned.

bq

Load Hive partitioned data using automatic partition key type detection:

bqload--source_format=ORC--hive_partitioning_mode=AUTO\--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix\dataset.tablegcs_uris

Load Hive partitioned data using string-typed partition key detection:

bqload--source_format=CSV--autodetect\--hive_partitioning_mode=STRINGS\--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix\dataset.tablegcs_uris

Load Hive partitioned data using a custom partition key schema that isencoded using thesource\_uri\_prefix field:

bqload--source_format=JSON--hive_partitioning_mode=CUSTOM\--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix/partition_key_schema\dataset.tablegcs_urisfile_schema

The partition key schema is encoded immediately following the source URIprefix. Use the following format to specify--hive_partitioning_source_uri_prefix:

--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix/{key1:TYPE1}/{key2:TYPE2}/{key3:TYPE3}

API

Support for Hive partitioning exists by setting theHivePartitioningOptionson theJobConfigurationLoad.

Note: WhenhivePartitioningOptions.mode is set toCUSTOM, you mustencode the partition key schema in thehivePartitioningOptions.sourceUriPrefix field as follows:gs://BUCKET/PATH_TO_TABLE/{KEY1:TYPE1}/{KEY2:TYPE2}/...

Perform incremental loads

Consider the following data layout:

gs://my_bucket/my_table/dt=2019-10-31/val=1/file1gs://my_bucket/my_table/dt=2018-10-31/val=2/file2gs://my_bucket/my_table/dt=2017-10-31/val=3/file3gs://my_bucket/my_table/dt=2016-10-31/val=4/file4

To load only data from 2019-10-31, do the following:

Set the Hive partitioning mode toAUTO,STRINGS, orCUSTOM.
Set the source URI prefix togs://my_bucket/my_table/forAUTO orSTRINGS Hive partitioning modes. For CUSTOM,providegs://my_bucket/my_table/{dt:DATE}/{val:INTEGER}.
Use the URIgs://my_bucket/my_table/dt=2019-10-31/*.
Data is loaded withdt andval columns included, withvalues2019-10-31 and1, respectively.

To load only data from specific files, do the following:

Set theHive partitioning mode toAUTO,STRINGS, orCUSTOM.
Set the source URI prefix togs://my_bucket/my_table/forAUTO orSTRINGS Hive partitioning modes. ForCUSTOM,providegs://my_bucket/my_table/{dt:DATE}/{val:INTEGER}.
Use the URIsgs://my_bucket/my_table/dt=2017-10-31/val=3/file3,gs://my_bucket/my_table/dt=2016-10-31/val=4/file4.
Data is loaded from both files with thedt andvalcolumns filled in.

Partition schema

The following sections explain thedefault Hive partitioned layout and theschema detection modes that BigQuery supports.

Supported data layouts

Hive partition keys appear as normal columns when you query data fromCloud Storage.The data must follow a default Hive partitioned layout.For example, the following files follow the default layout—thekey-value pairs are configured as directories with an equal sign (=) as a separator,and the partition keys are always in the same order:

gs://my_bucket/my_table/dt=2019-10-31/lang=en/my_filenamegs://my_bucket/my_table/dt=2018-10-31/lang=fr/my_filename

The common source URI prefix in this example isgs://my_bucket/my_table.

Unsupported data layouts

If the partition key names are not encoded in the directory path,partition schema detection fails. For example, consider the following path,which does not encode the partition key names:

gs://my_bucket/my_table/2019-10-31/en/my_filename

Files where the schema is not in a consistent order also fail detection.For example, consider the following two files with inverted partitionkey encodings:

gs://my_bucket/my_table/dt=2019-10-31/lang=en/my_filenamegs://my_bucket/my_table/lang=fr/dt=2018-10-31/my_filename

Detection modes

BigQuery supports three modes of Hive partition schema detection:

AUTO: Key names and types are automatically detected. The following typescan be detected:
- STRING
- INTEGER
- DATE
  For example,/date=2018-10-18/.
- TIMESTAMP
  For example,/time=2018-10-18 16:00:00+00/.
STRINGS: Key names are automatically converted toSTRING type.
CUSTOM: Partition key schema is encoded as specified in the source URIprefix.

Custom partition key schema

To use aCUSTOM schema, you must specify the schema in the source URI prefixfield. Using aCUSTOM schema lets you specify the type for each partition key.The values must validly parse as the specified type or the query fails.

For example, if you set thesource_uri_prefix flag togs://my_bucket/my_table/{dt:DATE}/{val:STRING},BigQuery treatsval as a STRING,dt as a DATE, andusesgs://my_bucket/my_table as the source URI prefix for the matched files.

Limitations

Hive partitioning support is built assuming a common source URI prefix forall URIs that ends immediately before partition encoding, as follows:gs://BUCKET/PATH_TO_TABLE/.
The directory structure of a Hive partitioned table is assumed to have the samepartitioning keys appear in the same order, with a maximum of tenpartition keys per table.
The data must follow adefault Hive partitioning layout.
The Hive partitioning keys and the columns in the underlying files cannot overlap.
Support is forGoogleSQL only.
Alllimitations for loading from Cloud Storageapply.

What's next

Learn aboutpartitioned tables.
Learn how touse SQL in BigQuery.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換