Loading externally partitioned data
BigQuery can load data that is stored in Cloud Storage using a Hivepartitioning layout.Hive partitioning means that the external data isorganized into multiple files, with a naming convention to separate files intodifferent partitions. For more information, seeSupported data layouts.
By default, the data is not partitioned in BigQuery afteryou load it, unless you explicitly create apartitioned table.
Load Hive partitioned data
To load Hive partitioned data, choose one of the following options:
Console
In the Google Cloud console, go toBigQuery.
- In the left pane, clickExplorer.
- In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
- ClickActions, and then clickCreate table. This opens theCreate table pane.
- In theSource section, specify the following details:
- ForCreate table from, selectGoogle Cloud Storage.
- ForSelect file from Cloud Storage bucket, enter the path to the Cloud Storage folder, usingwildcards. For example,
my_bucket/my_files*. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite. - From theFile format list, select the file type.
- Select theSource data partitioning checkbox, and then forSelect Source URI Prefix, enter the Cloud Storage URI prefix. For example,
gs://my_bucket/my_files. - In thePartition inference mode section, select one of the following options:
- Automatically infer types: set the partition schema detection mode to
AUTO. - All columns are strings: set the partition schema detection mode to
STRINGS. - Provide my own: set the partition schema detection mode to
CUSTOMand manually enter the schema information for the partition keys. For more information, seeProvide a custom partition key schema.
- Automatically infer types: set the partition schema detection mode to
- Optional: To require a partition filter on all queries for this table, select theRequire partition filter checkbox. Requiring a partition filter can reduce cost and improve performance. For more information, seeRequiring predicate filters on partition keys in queries.
- In theDestination section, specify the following details:
- ForProject, select the project in which you want to create the table.
- ForDataset, select the dataset in which you want to create the table.
- ForTable, enter the name of the table that you want to create.
- ForTable type, selectNative table .
- In theSchema section, enter theschema definition.
- To enable theauto detection of schema, selectAuto detect.
- To ignore rows with extra column values that do not match the schema, expand theAdvanced options section and selectUnknown values.
- ClickCreate table.
SQL
To create an externally partitioned table, use theWITH PARTITION COLUMNS clause of theLOAD DATA statement to specify the partition schema details.
For an example, seeLoad a file that is externally partitioned.
bq
Load Hive partitioned data using automatic partition key type detection:
bqload--source_format=ORC--hive_partitioning_mode=AUTO\--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix\dataset.tablegcs_uris
Load Hive partitioned data using string-typed partition key detection:
bqload--source_format=CSV--autodetect\--hive_partitioning_mode=STRINGS\--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix\dataset.tablegcs_uris
Load Hive partitioned data using a custom partition key schema that isencoded using thesource\_uri\_prefix field:
bqload--source_format=JSON--hive_partitioning_mode=CUSTOM\--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix/partition_key_schema\dataset.tablegcs_urisfile_schema
The partition key schema is encoded immediately following the source URIprefix. Use the following format to specify--hive_partitioning_source_uri_prefix:
--hive_partitioning_source_uri_prefix=gcs_uri_shared_prefix/{key1:TYPE1}/{key2:TYPE2}/{key3:TYPE3}
API
Support for Hive partitioning exists by setting theHivePartitioningOptionson theJobConfigurationLoad.
hivePartitioningOptions.mode is set toCUSTOM, you mustencode the partition key schema in thehivePartitioningOptions.sourceUriPrefix field as follows:gs://BUCKET/PATH_TO_TABLE/{KEY1:TYPE1}/{KEY2:TYPE2}/...Perform incremental loads
Consider the following data layout:
gs://my_bucket/my_table/dt=2019-10-31/val=1/file1gs://my_bucket/my_table/dt=2018-10-31/val=2/file2gs://my_bucket/my_table/dt=2017-10-31/val=3/file3gs://my_bucket/my_table/dt=2016-10-31/val=4/file4To load only data from 2019-10-31, do the following:
- Set theHive partitioning mode to
AUTO,STRINGS, orCUSTOM. - Set the source URI prefix to
gs://my_bucket/my_table/forAUTOorSTRINGSHive partitioning modes. For CUSTOM,providegs://my_bucket/my_table/{dt:DATE}/{val:INTEGER}. - Use the URI
gs://my_bucket/my_table/dt=2019-10-31/*. - Data is loaded with
dtandvalcolumns included, withvalues2019-10-31and1, respectively.
To load only data from specific files, do the following:
- Set theHive partitioning mode to
AUTO,STRINGS, orCUSTOM. - Set the source URI prefix to
gs://my_bucket/my_table/forAUTOorSTRINGSHive partitioning modes. ForCUSTOM,providegs://my_bucket/my_table/{dt:DATE}/{val:INTEGER}. - Use the URIs
gs://my_bucket/my_table/dt=2017-10-31/val=3/file3,gs://my_bucket/my_table/dt=2016-10-31/val=4/file4. - Data is loaded from both files with the
dtandvalcolumns filled in.
Partition schema
The following sections explain thedefault Hive partitioned layout and theschema detection modes that BigQuery supports.
Supported data layouts
Hive partition keys appear as normal columns when you query data fromCloud Storage.The data must follow a default Hive partitioned layout.For example, the following files follow the default layout—thekey-value pairs are configured as directories with an equal sign (=) as a separator,and the partition keys are always in the same order:
gs://my_bucket/my_table/dt=2019-10-31/lang=en/my_filenamegs://my_bucket/my_table/dt=2018-10-31/lang=fr/my_filenameThe common source URI prefix in this example isgs://my_bucket/my_table.
Unsupported data layouts
If the partition key names are not encoded in the directory path,partition schema detection fails. For example, consider the following path,which does not encode the partition key names:
gs://my_bucket/my_table/2019-10-31/en/my_filenameFiles where the schema is not in a consistent order also fail detection.For example, consider the following two files with inverted partitionkey encodings:
gs://my_bucket/my_table/dt=2019-10-31/lang=en/my_filenamegs://my_bucket/my_table/lang=fr/dt=2018-10-31/my_filenameDetection modes
BigQuery supports three modes of Hive partition schema detection:
AUTO: Key names and types are automatically detected. The following typescan be detected:STRINGS: Key names are automatically converted toSTRINGtype.CUSTOM: Partition key schema is encoded as specified in the source URIprefix.
Custom partition key schema
To use aCUSTOM schema, you must specify the schema in the source URI prefixfield. Using aCUSTOM schema lets you specify the type for each partition key.The values must validly parse as the specified type or the query fails.
For example, if you set thesource_uri_prefix flag togs://my_bucket/my_table/{dt:DATE}/{val:STRING},BigQuery treatsval as a STRING,dt as a DATE, andusesgs://my_bucket/my_table as the source URI prefix for the matched files.
Limitations
- Hive partitioning support is built assuming a common source URI prefix forall URIs that ends immediately before partition encoding, as follows:
gs://BUCKET/PATH_TO_TABLE/. - The directory structure of a Hive partitioned table is assumed to have the samepartitioning keys appear in the same order, with a maximum of tenpartition keys per table.
- The data must follow adefault Hive partitioning layout.
- The Hive partitioning keys and the columns in the underlying files cannot overlap.
Support is forGoogleSQL only.
What's next
- Learn aboutpartitioned tables.
- Learn how touse SQL in BigQuery.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.