Create a table definition file for an external data source

This page describes how to create a table definition file for an external datasource. An external data source is a data source that you can query directlyeven though the data is not stored in BigQuery.

A table definition file contains an external table's schema definition andmetadata, such as the table's data format and related properties. When youcreate a table definition file, you can use schemaauto-detectionto define the schema for an external data source. You can provide the schemainline or you can provide a JSON file containing the schema definition.

Table definition files are used with the bq command-line tool. Theproperties in a table definition file also apply to creating anExternalDataConfigurationwhen you use the REST API. You do not use table definition files when you createan external table by using the Google Cloud console.

You can create table definition files to describe apermanent or temporaryexternal table for thefollowing external data sources:

  • Cloud Storage

    • Comma-separated values (CSV)
    • Newline-delimited JSON
    • Avro files
    • Datastore export files
    • ORC files
    • Parquet files
    • Firestore export files
  • Google Drive

    • Comma-separated values (CSV)
    • Newline-delimited JSON
    • Avro files
    • Google Sheets
  • Bigtable

Before you begin

To create a table definition file, you need the URI for your data source:

Create a definition file for CSV, JSON, or Google Sheets files

Use one of the following methods to create a table definitionfile for CSV, JSON, or Google Sheets files in Cloud Storageor Drive:

Use theautodetect flag

If you specify a CSV, JSON, or Google Sheets file without including an inlineschema description or a schema file, you can use the--autodetect flagto set the"autodetect" option totrue in the table definition file. Whenauto-detect is enabled, BigQuery makes a best-effort attempt toautomatically infer the schema. For more information, seeSchema auto-detection for external data sources.

Use auto-detect with a Cloud Storage data source

Create a table definition file for a Cloud Storage data source:

  1. Use thebq mkdef commandwith the--autodetect flag to create a table definition file. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.

    bqmkdef\--autodetect\--source_format=SOURCE_FORMAT\"URI">/tmp/FILE_NAME

    Replace the following:

    • SOURCE_FORMAT: your file format
    • FILE_NAME: the name of your table definition file
    • URI: theCloud Storage URI

      For example,gs://mybucket/myfile.

  2. (Optional) Open the table definition file in a text editor. For example,the commandnano /tmp/file_name opens the file innano. The file should look like the following for a CSV external datasource. Notice"autodetect" is set totrue.

    {"autodetect":true,"csvOptions":{"allowJaggedRows":false,"allowQuotedNewlines":false,"encoding":"UTF-8","fieldDelimiter":",","quote":"\"","skipLeadingRows":0},"sourceFormat":"CSV","sourceUris":["URI"]}
  3. (Optional) Manually edit the table definition file to modify, add, ordelete general settings such asmaxBadRecords andignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.

Use auto-detect with a Drive data source

Create a table definition file for a Drive data source:

  1. Use thebq mkdef commandwith the--autodetect flag to create a table definition. Themkdef commandgenerates a table definition file in JSON format. The following examplecreates a table definition and writes the output to a file:/tmp/file_name.

    bqmkdef\--autodetect\--source_format=SOURCE_FORMAT\"URI">/tmp/FILE_NAME

    Replace the following:

    • SOURCE_FORMAT: your file format
    • FILE_NAME: the name of your table definition file
    • URI: theDrive URI

      For example,https://drive.google.com/open?id=123ABCD123AbcD123Abcd.

  2. Open the table definition file in a text editor. For example, the commandnano /tmp/file_name opens the file in nano. The fileshould look like the following for a Google Sheets external data source. Notice"autodetect" is set totrue.

    {"autodetect":true,"sourceFormat":"GOOGLE_SHEETS","sourceUris":["URI"]}
  3. (Optional) Manually edit the table definition file to modify, add, ordelete general settings such asmaxBadRecords andignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.

  4. To specify a particular sheet or a cell range in a Google Sheets file, addtherange property to theGoogleSheetsOptions object in the table definition file. To query a particular sheet,specify the sheet name. To query a cell range, specify the range in the form:sheet_name!top_left_cell_id:bottom_right_cell_id,for example,"Sheet1!A1:B20". If therange parameter is not specified, thefirst sheet in the file is used.

Use an inline schema

If you do not want to use schema auto-detect, you can create a table definitionfile by providing an inline schema definition. To provide an inline schemadefinition, list the fields and data types on the command line in the followingformat:FIELD:DATA_TYPE,FIELD:DATA_TYPE.

Use an inline schema with a Cloud Storage or Drive data source

Create a table definition for a Cloud Storage or Drive data source by using an inline schema definition:

  1. Use thebq mkdef commandwith the--noautodetect flag to create a table definition. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.

    bqmkdef\--noautodetect\--source_format=SOURCE_FORMAT\"URI"\FIELD:DATA_TYPE,FIELD:DATA_TYPE>/tmp/FILE_NAME

    Replace the following

    • SOURCE_FORMAT: the source file format
    • URI: theCloud Storage URIor yourDrive URI

      For example,gs://mybucket/myfile for Cloud Storage orhttps://drive.google.com/open?id=123ABCD123AbcD123Abcd for Drive.

    • FIELD:DATA_TYPE,FIELD:DATA_TYPE: the schemadefinition

      For example,Name:STRING,Address:STRING, ....

    • FILE_NAME: the name of your table definition file

  2. (Optional) Open the table definition file in a text editor. For example,the commandnano /tmp/file_name opens the file innano. The file should look like the following. Notice"autodetect" is notenabled, and the schema information is written to the table definition file.

    {"schema":{"fields":[{"name":"FIELD","type":"DATA_TYPE"},{"name":"FIELD","type":"DATA_TYPE"}...]},"sourceFormat":"NEWLINE_DELIMITED_JSON","sourceUris":["URI"]}
  3. (Optional) Manually edit the table definition file to modify, add, ordelete general settings such asmaxBadRecords andignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.

Use a JSON schema file

If you do not want to use auto-detect or provide an inline schema definition,you can create a JSON schema file and reference it when creating your tabledefinition file. Create the JSON schema file manually on your local machine.Referencing a JSON schema file stored in Cloud Storageor in Drive is not supported.

Use a schema file with a Cloud Storage data source

Create a table definition for a Cloud Storage data sourceusing a JSON schema file:

  1. Use thebq mkdef commandwith the--noautodetect flag to create a table definition. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.

    bqmkdef\--noautodetect\--source_format=SOURCE_FORMAT\"URI"\PATH_TO_SCHEMA_FILE>/tmp/FILE_NAME

    Replace the following:

    • SOURCE_FORMAT: your file format
    • FILE_NAME: the name of your table definition file
    • URI: theCloud Storage URI

      For example,gs://mybucket/myfile.

    • PATH_TO_SCHEMA_FILE: the location of the JSON schemafile on your local machine

  2. (Optional) Open the table definition file in a text editor. For example,the commandnano /tmp/file_name opens the file in
    nano. The file should look like the following. Notice"autodetect" is notenabled, and the schema information is written to the table definition file.

    {"schema":{"fields":[{"name":"FIELD","type":"DATA_TYPE"},{"name":"FIELD","type":"DATA_TYPE"}...]},"sourceFormat":"NEWLINE_DELIMITED_JSON","sourceUris":["URI"]}
  3. (Optional) Manually edit the table definition file to modify, add, ordelete general settings such asmaxBadRecords andignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.

Use a schema file with a Drive data source

Create a table definition for a Drive data sourceusing a JSON schema file:

  1. Use thebq mkdef commandwith the--noautodetect flag to create a table definition. Themkdefcommand generates a table definition file in JSON format. The followingexample creates a table definition and writes the output to a file:/tmp/file_name.

    bqmkdef\--noautodetect\--source_format=source_format\"URI"\PATH_TO_SCHEMA_FILE>/tmp/FILE_NAME

    Replace the following:

    • SOURCE_FORMAT: the source file format
    • URI: theDrive URI

      For example,https://drive.google.com/open?id=123ABCD123AbcD123Abcd.

    • PATH_TO_SCHEMA_FILE: the location of the JSON schemafile on your local machine

    • FILE_NAME: the name of your table definition file

  2. Open the table definition file in a text editor. For example, the commandnano /tmp/file_name opens the file in nano. The fileshould look like the following. Notice"autodetect" is not enabled, and theschema information is written to the table definition file.

    {"schema":{"fields":[{"name":"FIELD","type":"DATA_TYPE"},{"name":"FIELD","type":"DATA_TYPE"}...]},"sourceFormat":"GOOGLE_SHEETS","sourceUris":["URI"]}
  3. (Optional) Manually edit the table definition file to modify, add, ordelete general settings such asmaxBadRecords andignoreUnknownValues. Thereare no configuration settings that are specific to JSON source files, but thereare settings that apply toCSVandGoogle Sheetsfiles. For more information, seeExternalDataConfigurationin the API reference.

  4. To specify a particular sheet or a cell range in a Google Sheets file, addtherange property to theGoogleSheetsOptions object in the table definition file. To query a particular sheet,specify the sheet name. To query a cell range, specify the range in the form:sheet_name!top_left_cell_id:bottom_right_cell_id,for example,"Sheet1!A1:B20". If therange parameter is not specified, thefirst sheet in the file is used.

Create a definition file for self-describing formats

Avro, Parquet, and ORC areself-describing formats. Data files in theseformats contain their own schema information. If you use one of these formats asan external data source, then BigQuery automatically retrieves theschema using the source data. When creating a table definition, you don't needto use schema auto-detection, and you don't need to provide an inline schemadefinition or schema file.

You can create a table definition file for Avro, Parquet, or ORC data stored inCloud Storage or Drive:

  1. Use thebq mkdef commandto create a table definition.

    bqmkdef\--source_format=FORMAT\"URI">FILE_NAME

    Replace the following:

    • FORMAT: the source format

    • URI: theCloud Storage URIor yourDrive URI

      For example,gs://mybucket/myfile for Cloud Storage orhttps://drive.google.com/open?id=123ABCD123AbcD123Abcdfor Drive.

    • FILE_NAME: the name of your table definition file

  2. Optional: Open the table definition file in a text editor. The file lookssimilar to the following:

    {"sourceFormat":"AVRO","sourceUris":["URI"]}
  3. Optional: Manually edit the table definition file to modify, add, ordelete general settings such asmaxBadRecords andignoreUnknownValues.For more information, seeExternalDataConfigurationin the API reference.

Create a definition file for hive-partitioned data

Use thebq mkdef commandwith thehive_partitioning_mode and thehive_partitioning_source_uri_prefix flags tocreate a definition file for hive-partitioned data that's stored inCloud Storage, Amazon Simple Storage Service (Amazon S3), or Azure Blob Storage.

Create a definition file for Datastore and Firestore

If you use a Datastore or Firestore export as an external datasource, BigQuery automatically retrieves the schema using theself-describing source data. When creating a table definition, you don't need toprovide an inline schema definition or schema file.

You can create a table definition file for Datastore andFirestore export data stored in Cloud Storage:

  1. Use thebq mkdef commandto create a table definition. You do not need to use the--noautodetectflag with Datastore or Firestore backup files. Schemaauto-detect is disabled for these file types. Themkdef command generates atable definition file in JSON format. The following example creates a tabledefinition and writes the output to a file:/tmp/file_name.

    bqmkdef\--source_format=DATASTORE_BACKUP\"URI">/tmp/FILE_NAME

    Replace the following:

    TheDATASTORE_BACKUP source format is used for bothDatastore and Firestore.

  2. (Optional) Open the table definition file in a text editor. For example,the commandnano /tmp/file_name opens the file in nano.The file should look like the following. Notice there is no need for the"autodetect" setting.

    {"sourceFormat":"DATASTORE_BACKUP","sourceUris":["gs://URI"]}
  3. (Optional) Manually edit the table definition file to modify, add, ordelete settings such asmaxBadRecords andignoreUnknownValues. Thereare no configuration settings that are specific to Datastore andFirestore export files. For more information, seeExternalDataConfigurationin the API reference.

Create a definition file for Bigtable

When you create a table definition file for Bigtable, you manuallygenerate the file in JSON format. Using themkdef command to create a tabledefinition is not supported for Bigtable data sources.Schema auto-detect is also unsupported for Bigtable. For a list ofBigtable table definition options, seeBigtableOptions inthe REST API reference.

A JSON table definition file for Bigtable looks like the following.Using this table definition file, BigQuery reads data from asingle column family, interpreting the values as binary encoded integers.

{"sourceFormat":"BIGTABLE","sourceUris":["https://googleapis.com/bigtable/projects/PROJECT_ID/instances/INSTANCE_ID[/appProfiles/APP_PROFILE_ID]/tables/TABLE_NAME"],"bigtableOptions":{"columnFamilies":[{"familyId":"FAMILY_ID","type":"INTEGER","encoding":"BINARY"}]}}

Replace the following:

  • PROJECT_ID: the project containing yourBigtable cluster
  • INSTANCE_ID: the Bigtable instance ID
  • APP_PROFILE_ID (optional): the ID of the app profilethat you want to use to read your Bigtable data.App profilesettings indicate whether the external tableuses Data Boost or provisioned nodes for compute.
  • TABLE_NAME: the name of the table you're querying
  • FAMILY_ID: the column family identifier

For more information, seeRetrieve the Bigtable URI.

Wildcard support for table definition files

If your data is separated into multiple files, you can use an asterisk (*)wildcard to select multiple files. Use of the asterisk wildcard must followthese rules:

  • The asterisk can appear inside the object name or at the end of the objectname.
  • Using multiple asterisks is unsupported. For example, the pathgs://mybucket/fed-*/temp/*.csv is invalid.
  • Using an asterisk with the bucket name is unsupported.

Examples:

  • The following example shows how to select all of the files in all thefolders which start with the prefixgs://mybucket/fed-samples/fed-sample:

    gs://mybucket/fed-samples/fed-sample*
  • The following example shows how to select only files with a.csv extensionin the folder namedfed-samples and any subfolders offed-samples:

    gs://mybucket/fed-samples/*.csv
  • The following example shows how to select files with a naming pattern offed-sample*.csv in the folder namedfed-samples. This example doesn'tselect files in subfolders offed-samples.

    gs://mybucket/fed-samples/fed-sample*.csv

When using the bq command-line tool, you might need to escape the asterisk on someplatforms.

If you use an asterisk wildcard, enclose the bucket and filename in quotes. Forexample, if you have two files namedfed-sample000001.csv andfed-sample000002.csv and you want to use an asterisk to select both of them,the bucket URI would be"gs://mybucket/fed-sample*".

The* wildcard character is not allowed when creating table definition filesfor the following data sources:

  • Bigtable. For Bigtable data, only one datasource can be specified. The URI value must be a valid HTTPS URL for aBigtable table.
  • Datastore orFirestore. Datastoreor Firestore exports stored inCloud Storage. For Datastorebackups, only one data source can be specified. The URI value must end with.backup_info or.export_metadata.
  • Drive. Data stored in Drive.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.