Create a table definition file for an external data source
This page describes how to create a table definition file for an external datasource. An external data source is a data source that you can query directlyeven though the data is not stored in BigQuery.
A table definition file contains an external table's schema definition andmetadata, such as the table's data format and related properties. When youcreate a table definition file, you can use schemaauto-detectionto define the schema for an external data source. You can provide the schemainline or you can provide a JSON file containing the schema definition.
Table definition files are used with the bq command-line tool. Theproperties in a table definition file also apply to creating anExternalDataConfigurationwhen you use the REST API. You do not use table definition files when you createan external table by using the Google Cloud console.
You can create table definition files to describe apermanent or temporaryexternal table for thefollowing external data sources:
Cloud Storage
- Comma-separated values (CSV)
- Newline-delimited JSON
- Avro files
- Datastore export files
- ORC files
- Parquet files
- Firestore export files
Google Drive
- Comma-separated values (CSV)
- Newline-delimited JSON
- Avro files
- Google Sheets
Bigtable
Before you begin
To create a table definition file, you need the URI for your data source:
- For a Drive data source, you need theDrive URI
- For a Cloud Storage data source, you need theCloud Storage URI
- For a Bigtable data source, you need theBigtable URI
Create a definition file for CSV, JSON, or Google Sheets files
Use one of the following methods to create a table definitionfile for CSV, JSON, or Google Sheets files in Cloud Storageor Drive:
Use theautodetect flag
If you specify a CSV, JSON, or Google Sheets file without including an inlineschema description or a schema file, you can use the--autodetect flagto set the"autodetect" option totrue in the table definition file. Whenauto-detect is enabled, BigQuery makes a best-effort attempt toautomatically infer the schema. For more information, seeSchema auto-detection for external data sources.
Use auto-detect with a Cloud Storage data source
Create a table definition file for a Cloud Storage data source:
Use the
bq mkdefcommandwith the--autodetectflag to create a table definition file. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.bqmkdef\--autodetect\--source_format=SOURCE_FORMAT\"URI">/tmp/FILE_NAME
Replace the following:
SOURCE_FORMAT: your file formatFILE_NAME: the name of your table definition fileURI: theCloud Storage URIFor example,
gs://mybucket/myfile.
(Optional) Open the table definition file in a text editor. For example,the command
nano /tmp/file_nameopens the file innano. The file should look like the following for a CSV external datasource. Notice"autodetect"is set totrue.{"autodetect":true,"csvOptions":{"allowJaggedRows":false,"allowQuotedNewlines":false,"encoding":"UTF-8","fieldDelimiter":",","quote":"\"","skipLeadingRows":0},"sourceFormat":"CSV","sourceUris":["URI"]}
(Optional) Manually edit the table definition file to modify, add, ordelete general settings such as
maxBadRecordsandignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.
Use auto-detect with a Drive data source
Create a table definition file for a Drive data source:
Use the
bq mkdefcommandwith the--autodetectflag to create a table definition. Themkdefcommandgenerates a table definition file in JSON format. The following examplecreates a table definition and writes the output to a file:/tmp/file_name.bqmkdef\--autodetect\--source_format=SOURCE_FORMAT\"URI">/tmp/FILE_NAME
Replace the following:
SOURCE_FORMAT: your file formatFILE_NAME: the name of your table definition fileURI: theDrive URIFor example,
https://drive.google.com/open?id=123ABCD123AbcD123Abcd.
Open the table definition file in a text editor. For example, the command
nano /tmp/file_nameopens the file in nano. The fileshould look like the following for a Google Sheets external data source. Notice"autodetect"is set totrue.{"autodetect":true,"sourceFormat":"GOOGLE_SHEETS","sourceUris":["URI"]}
(Optional) Manually edit the table definition file to modify, add, ordelete general settings such as
maxBadRecordsandignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.To specify a particular sheet or a cell range in a Google Sheets file, addthe
rangeproperty to theGoogleSheetsOptionsobject in the table definition file. To query a particular sheet,specify the sheet name. To query a cell range, specify the range in the form:sheet_name!top_left_cell_id:bottom_right_cell_id,for example,"Sheet1!A1:B20". If therangeparameter is not specified, thefirst sheet in the file is used.
Use an inline schema
If you do not want to use schema auto-detect, you can create a table definitionfile by providing an inline schema definition. To provide an inline schemadefinition, list the fields and data types on the command line in the followingformat:FIELD:DATA_TYPE,FIELD:DATA_TYPE.
Use an inline schema with a Cloud Storage or Drive data source
Create a table definition for a Cloud Storage or Drive data source by using an inline schema definition:
Use the
bq mkdefcommandwith the--noautodetectflag to create a table definition. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.bqmkdef\--noautodetect\--source_format=SOURCE_FORMAT\"URI"\FIELD:DATA_TYPE,FIELD:DATA_TYPE>/tmp/FILE_NAME
Replace the following
SOURCE_FORMAT: the source file formatURI: theCloud Storage URIor yourDrive URIFor example,
gs://mybucket/myfilefor Cloud Storage orhttps://drive.google.com/open?id=123ABCD123AbcD123Abcdfor Drive.FIELD:DATA_TYPE,FIELD:DATA_TYPE: the schemadefinitionFor example,
Name:STRING,Address:STRING, ....FILE_NAME: the name of your table definition file
(Optional) Open the table definition file in a text editor. For example,the command
nano /tmp/file_nameopens the file innano. The file should look like the following. Notice"autodetect"is notenabled, and the schema information is written to the table definition file.{"schema":{"fields":[{"name":"FIELD","type":"DATA_TYPE"},{"name":"FIELD","type":"DATA_TYPE"}...]},"sourceFormat":"NEWLINE_DELIMITED_JSON","sourceUris":["URI"]}
(Optional) Manually edit the table definition file to modify, add, ordelete general settings such as
maxBadRecordsandignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.
Use a JSON schema file
If you do not want to use auto-detect or provide an inline schema definition,you can create a JSON schema file and reference it when creating your tabledefinition file. Create the JSON schema file manually on your local machine.Referencing a JSON schema file stored in Cloud Storageor in Drive is not supported.
Use a schema file with a Cloud Storage data source
Create a table definition for a Cloud Storage data sourceusing a JSON schema file:
Use the
bq mkdefcommandwith the--noautodetectflag to create a table definition. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.bqmkdef\--noautodetect\--source_format=SOURCE_FORMAT\"URI"\PATH_TO_SCHEMA_FILE>/tmp/FILE_NAME
Replace the following:
SOURCE_FORMAT: your file formatFILE_NAME: the name of your table definition fileURI: theCloud Storage URIFor example,
gs://mybucket/myfile.PATH_TO_SCHEMA_FILE: the location of the JSON schemafile on your local machine
(Optional) Open the table definition file in a text editor. For example,the command
nano /tmp/file_nameopens the file in
nano. The file should look like the following. Notice"autodetect"is notenabled, and the schema information is written to the table definition file.{"schema":{"fields":[{"name":"FIELD","type":"DATA_TYPE"},{"name":"FIELD","type":"DATA_TYPE"}...]},"sourceFormat":"NEWLINE_DELIMITED_JSON","sourceUris":["URI"]}
(Optional) Manually edit the table definition file to modify, add, ordelete general settings such as
maxBadRecordsandignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.
Use a schema file with a Drive data source
Create a table definition for a Drive data sourceusing a JSON schema file:
Use the
bq mkdefcommandwith the--noautodetectflag to create a table definition. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.bqmkdef\--noautodetect\--source_format=source_format\"URI"\PATH_TO_SCHEMA_FILE>/tmp/FILE_NAME
Replace the following:
SOURCE_FORMAT: the source file formatURI: theDrive URIFor example,
https://drive.google.com/open?id=123ABCD123AbcD123Abcd.PATH_TO_SCHEMA_FILE: the location of the JSON schemafile on your local machineFILE_NAME: the name of your table definition file
Open the table definition file in a text editor. For example, the command
nano /tmp/file_nameopens the file in nano. The fileshould look like the following. Notice"autodetect"is not enabled, and theschema information is written to the table definition file.{"schema":{"fields":[{"name":"FIELD","type":"DATA_TYPE"},{"name":"FIELD","type":"DATA_TYPE"}...]},"sourceFormat":"GOOGLE_SHEETS","sourceUris":["URI"]}
(Optional) Manually edit the table definition file to modify, add, ordelete general settings such as
maxBadRecordsandignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.To specify a particular sheet or a cell range in a Google Sheets file, addthe
rangeproperty to theGoogleSheetsOptionsobject in the table definition file. To query a particular sheet,specify the sheet name. To query a cell range, specify the range in the form:sheet_name!top_left_cell_id:bottom_right_cell_id,for example,"Sheet1!A1:B20". If therangeparameter is not specified, thefirst sheet in the file is used.
Create a definition file for self-describing formats
Avro, Parquet, and ORC areself-describing formats. Data files in theseformats contain their own schema information. If you use one of these formats asan external data source, then BigQuery automatically retrieves theschema using the source data. When creating a table definition, you don't needto use schema auto-detection, and you don't need to provide an inline schemadefinition or schema file.
You can create a table definition file for Avro, Parquet, or ORC data stored inCloud Storage or Drive:
Use the
bq mkdefcommandto create a table definition.bqmkdef\--source_format=FORMAT\"URI">FILE_NAME
Replace the following:
FORMAT: the source formatURI: theCloud Storage URIor yourDrive URIFor example,
gs://mybucket/myfilefor Cloud Storage orhttps://drive.google.com/open?id=123ABCD123AbcD123Abcdfor Drive.FILE_NAME: the name of your table definition file
Optional: Open the table definition file in a text editor. The file lookssimilar to the following:
{"sourceFormat":"AVRO","sourceUris":["URI"]}
Optional: Manually edit the table definition file to modify, add, ordelete general settings such as
maxBadRecordsandignoreUnknownValues.For more information, seeExternalDataConfigurationin the API reference.
Create a definition file for hive-partitioned data
Use thebq mkdef commandwith thehive_partitioning_mode and thehive_partitioning_source_uri_prefix flags tocreate a definition file for hive-partitioned data that's stored inCloud Storage, Amazon Simple Storage Service (Amazon S3), or Azure Blob Storage.
Create a definition file for Datastore and Firestore
If you use a Datastore or Firestore export as an external datasource, BigQuery automatically retrieves the schema using theself-describing source data. When creating a table definition, you don't need toprovide an inline schema definition or schema file.
You can create a table definition file for Datastore andFirestore export data stored in Cloud Storage:
Use the
bq mkdefcommandto create a table definition. You do not need to use the--noautodetectflag with Datastore or Firestore backup files. Schemaauto-detect is disabled for these file types. Themkdefcommand generates atable definition file in JSON format. The following example creates a tabledefinition and writes the output to a file:/tmp/file_name.bqmkdef\--source_format=DATASTORE_BACKUP\"URI">/tmp/FILE_NAME
Replace the following:
URI: theCloud Storage URIFILE_NAME: the name of your table definition file
The
DATASTORE_BACKUPsource format is used for bothDatastore and Firestore.(Optional) Open the table definition file in a text editor. For example,the command
nano /tmp/file_nameopens the file in nano.The file should look like the following. Notice there is no need for the"autodetect"setting.{"sourceFormat":"DATASTORE_BACKUP","sourceUris":["gs://URI"]}
(Optional) Manually edit the table definition file to modify, add, ordelete settings such as
maxBadRecordsandignoreUnknownValues. Thereare no configuration settings that are specific to Datastore andFirestore export files. For more information, seeExternalDataConfigurationin the API reference.
Create a definition file for Bigtable
When you create a table definition file for Bigtable, you manuallygenerate the file in JSON format. Using themkdef command to create a tabledefinition is not supported for Bigtable data sources.Schema auto-detect is also unsupported for Bigtable. For a list ofBigtable table definition options, seeBigtableOptions inthe REST API reference.
A JSON table definition file for Bigtable looks like the following.Using this table definition file, BigQuery reads data from asingle column family, interpreting the values as binary encoded integers.
{"sourceFormat":"BIGTABLE","sourceUris":["https://googleapis.com/bigtable/projects/PROJECT_ID/instances/INSTANCE_ID[/appProfiles/APP_PROFILE_ID]/tables/TABLE_NAME"],"bigtableOptions":{"columnFamilies":[{"familyId":"FAMILY_ID","type":"INTEGER","encoding":"BINARY"}]}}
Replace the following:
PROJECT_ID: the project containing yourBigtable clusterINSTANCE_ID: the Bigtable instance IDAPP_PROFILE_ID(optional): the ID of the app profilethat you want to use to read your Bigtable data.App profilesettings indicate whether the external tableuses Data Boost or provisioned nodes for compute.TABLE_NAME: the name of the table you're queryingFAMILY_ID: the column family identifier
For more information, seeRetrieve the Bigtable URI.
Wildcard support for table definition files
If your data is separated into multiple files, you can use an asterisk (*)wildcard to select multiple files. Use of the asterisk wildcard must followthese rules:
- The asterisk can appear inside the object name or at the end of the objectname.
- Using multiple asterisks is unsupported. For example, the path
gs://mybucket/fed-*/temp/*.csvis invalid. - Using an asterisk with the bucket name is unsupported.
Examples:
The following example shows how to select all of the files in all thefolders which start with the prefix
gs://mybucket/fed-samples/fed-sample:gs://mybucket/fed-samples/fed-sample*The following example shows how to select only files with a
.csvextensionin the folder namedfed-samplesand any subfolders offed-samples:gs://mybucket/fed-samples/*.csvThe following example shows how to select files with a naming pattern of
fed-sample*.csvin the folder namedfed-samples. This example doesn'tselect files in subfolders offed-samples.gs://mybucket/fed-samples/fed-sample*.csv
When using the bq command-line tool, you might need to escape the asterisk on someplatforms.
If you use an asterisk wildcard, enclose the bucket and filename in quotes. Forexample, if you have two files namedfed-sample000001.csv andfed-sample000002.csv and you want to use an asterisk to select both of them,the bucket URI would be"gs://mybucket/fed-sample*".
The* wildcard character is not allowed when creating table definition filesfor the following data sources:
- Bigtable. For Bigtable data, only one datasource can be specified. The URI value must be a valid HTTPS URL for aBigtable table.
- Datastore orFirestore. Datastoreor Firestore exports stored inCloud Storage. For Datastorebackups, only one data source can be specified. The URI value must end with
.backup_infoor.export_metadata. - Drive. Data stored in Drive.
What's next
- Learn how to queryCloud Storage data.
- Learn how to queryDrive data.
- Learn how to queryBigtable data.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.