Using schema auto-detection

Schema auto-detection

Schema auto-detection enables BigQuery to infer the schema forCSV, JSON, or Google Sheets data. Schema auto-detection is available when youload data into BigQuery and whenyou query anexternal data source.

When auto-detection is enabled, BigQuery infers the data type foreach column. BigQuery selects a random file in the data sourceand scans up to the first 500 rows of data to use as a representative sample.BigQuery then examines each field and attempts to assign a datatype to that field based on the values in the sample. If all of the rows in acolumn are empty, auto-detection will default toSTRINGdata type for the column.

If you don't enable schema auto-detection for CSV, JSON, or Google Sheetsdata, then you must provide the schema manually when creating the table.

You don't need to enable schema auto-detection for Avro, Parquet, ORC, Firestoreexport, or Datastore export files. These file formats are self-describing, soBigQuery automatically infers the table schema from the sourcedata. For Parquet, Avro, and Orc files, you can optionally provide an explicitschema to override the inferred schema.

You can see the detected schema for a table in the following ways:

  • Use the Google Cloud console.
  • Use the bq command-line tool'sbq showcommand.

When BigQuery detects schemas, it might, on rare occasions,change a field name to make it compatible with GoogleSQLsyntax.

For information about data type conversions, see the following:

Loading data using schema auto-detection

To enable schema auto-detection when loading data, use one of these approaches:

  • In the Google Cloud console, in theSchema section, forAuto detect,check theSchema and input parameters option.
  • In the bq command-line tool, use thebq load command with the--autodetectparameter.

When schema auto-detection is enabled, BigQuery makes abest-effort attempt to automatically infer the schema for CSV and JSON files.The auto-detection logic infers the schema field types by reading up to thefirst 500 rows of data. Leading lines are skipped if the--skip_leading_rowsflag is present. The field types are based on the rows having the most fields.Therefore, auto-detection should work as expected as long as there is at leastone row of data that has values in every column/field.

Schema auto-detection is not used with Avro files, Parquet files, ORC files,Firestore export files, or Datastore export files. When youload these files into BigQuery, the table schema is automaticallyretrieved from the self-describing source data.

To use schema auto-detection when you load JSON or CSV data:

Console

  1. In the Google Cloud console, go to the BigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, clickExpand left pane to open the pane.

  3. In theExplorer pane, expand your project, clickDatasets, andthen click your dataset.

  4. In the details pane, clickCreate table.

  5. On theCreate table page, in theSource section:

    • ForCreate table from, select your desired source type.
    • In the source field, browse for the File/Cloud Storage bucket, orenter theCloud Storage URI. Note that you cannotinclude multiple URIs in the Google Cloud console, butwildcardsare supported. The Cloud Storage bucket must be in the samelocation as the dataset that contains the table you're creating.

      Select file.

    • ForFile format, selectCSV orJSON.

  6. On theCreate table page, in theDestination section:

    • ForDataset name, choose the appropriate dataset.

      Select dataset.

    • In theTable name field, enter the name of the table you'recreating.

    • Verify thatTable type is set toNative table.

  7. ClickCreate table.

bq

Issue thebq load command with the--autodetect parameter.

(Optional) Supply the--location flag and set the value to yourlocation.

The following command loads a file using schema auto-detect:

bq--location=LOCATIONload\--autodetect\--source_format=FORMAT\DATASET.TABLE\PATH_TO_SOURCE

Replace the following:

  • LOCATION: the name of your location. The--location flag is optional. For example, if you are usingBigQuery in the Tokyo region, set the flag's value toasia-northeast1. You can set a default value for the location by usingthe.bigqueryrc file.
  • FORMAT: eitherNEWLINE_DELIMITED_JSON orCSV.
  • DATASET: the dataset that contains the tableinto which you're loading data.
  • TABLE: the name of the table into which you'reloading data.
  • PATH_TO_SOURCE: is the location of the CSV orJSON file.

Examples:

Enter the following command to loadmyfile.csv from your localmachine into a table namedmytable that is stored in a dataset namedmydataset.

bqload--autodetect--source_format=CSVmydataset.mytable./myfile.csv

Enter the following command to loadmyfile.json from your localmachine into a table namedmytable that is stored in a dataset namedmydataset.

bqload--autodetect--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable./myfile.json

API

  1. Create aload job that points to the source data. For information aboutcreating jobs, seeRunning BigQuery jobs programmatically.Specify your location in thelocation property in thejobReferencesection.

  2. Specify the data format by setting thesourceFormat property. To useschema autodetection, this value must be set toNEWLINE_DELIMITED_JSONorCSV.

  3. Use theautodetect property to set schema autodetection totrue.

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

import("context""fmt""cloud.google.com/go/bigquery")// importJSONAutodetectSchema demonstrates loading data from newline-delimited JSON data in Cloud Storage// and using schema autodetection to identify the available columns.funcimportJSONAutodetectSchema(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.json")gcsRef.SourceFormat=bigquery.JSONgcsRef.AutoDetect=trueloader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.TableId;// Sample to load JSON data with autodetect schema from Cloud Storage into a new BigQuery tablepublicclassLoadJsonFromGCSAutodetect{publicstaticvoidrunLoadJsonFromGCSAutodetect(){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.json";loadJsonFromGCSAutodetect(datasetName,tableName,sourceUri);}publicstaticvoidloadJsonFromGCSAutodetect(StringdatasetName,StringtableName,StringsourceUri){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri).setFormatOptions(FormatOptions.json()).setAutodetect(true).build();// Load data from a GCS JSON file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()){System.out.println("Json Autodetect from GCS successfully loaded in a table");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}
importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.CsvOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.TableId;// Sample to load CSV data with autodetect schema from Cloud Storage into a new BigQuery tablepublicclassLoadCsvFromGcsAutodetect{publicstaticvoidmain(String[]args){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";loadCsvFromGcsAutodetect(datasetName,tableName,sourceUri);}publicstaticvoidloadCsvFromGcsAutodetect(StringdatasetName,StringtableName,StringsourceUri){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);// Skip header row in the file.CsvOptionscsvOptions=CsvOptions.newBuilder().setSkipLeadingRows(1).build();LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri).setFormatOptions(csvOptions).setAutodetect(true).build();// Load data from a GCS CSV file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone() &&job.getStatus().getError()==null){System.out.println("CSV Autodetect from GCS successfully loaded in a table");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');/** * TODO(developer): Uncomment the following lines before running the sample. */// const datasetId = "my_dataset";// const tableId = "my_table";/** * This sample loads the JSON file at * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.json * * TODO(developer): Replace the following lines with the path to your file. */constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.json';asyncfunctionloadJSONFromGCSAutodetect(){// Imports a GCS file into a table with autodetected schema.// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'NEWLINE_DELIMITED_JSON',autodetect:true,location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id} completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors &&errors.length >0){throwerrors;}}loadJSONFromGCSAutodetect();

PHP

Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

use Google\Cloud\BigQuery\BigQueryClient;/** * Imports data to the given table from json file present in GCS by auto * detecting options and schema. * * @param string $projectId The project Id of your Google Cloud Project. * @param string $datasetId The BigQuery dataset ID. * @param string $tableId The BigQuery table ID. */function import_from_storage_json_autodetect(    string $projectId,    string $datasetId,    string $tableId = 'us_states'): void {    // instantiate the bigquery table service    $bigQuery = new BigQueryClient([      'projectId' => $projectId,    ]);    $dataset = $bigQuery->dataset($datasetId);    $table = $dataset->table($tableId);    // create the import job    $gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.json';    $loadConfig = $table->loadFromStorage($gcsUri)->autodetect(true)->sourceFormat('NEWLINE_DELIMITED_JSON');    $job = $table->runJob($loadConfig);    // check if the job is complete    $job->reload();    if (!$job->isComplete()) {        throw new \Exception('Job has not yet completed', 500);    }    // check if the job has errors    if (isset($job->info()['status']['errorResult'])) {        $error = $job->info()['status']['errorResult']['message'];        printf('Error running job: %s' . PHP_EOL, $error);    } else {        print('Data imported successfully' . PHP_EOL);    }}

Python

To enable schema auto-detection, set theLoadJobConfig.autodetectproperty toTrue.

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name# Set the encryption key to use for the destination.# TODO: Replace this key with a key you have created in KMS.# kms_key_name = "projects/{}/locations/{}/keyRings/{}/cryptoKeys/{}".format(#     "cloud-samples-tests", "us", "test", "test"# )job_config=bigquery.LoadJobConfig(autodetect=True,source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON)uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)print("Loaded{} rows.".format(destination_table.num_rows))

Ruby

Before trying this sample, follow theRuby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryRuby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

require"google/cloud/bigquery"defload_table_gcs_json_autodetectdataset_id="your_dataset_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idgcs_uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"table_id="us_states"load_job=dataset.load_jobtable_id,gcs_uri,format:"json",autodetect:trueputs"Starting job#{load_job.job_id}"load_job.wait_until_done!# Waits for table load to complete.puts"Job finished."table=dataset.tabletable_idputs"Loaded#{table.rows_count} rows to table#{table.id}"end

Schema auto-detection for external data sources

Schema auto-detection can be used with CSV, JSON, and Google Sheets externaldata sources. When schema auto-detection is enabled, BigQuerymakes a best-effort attempt to automatically infer the schema from the sourcedata. If you don't enable schema auto-detection for these sources, then youmust provide an explicit schema.

You don't need to enable schema auto-detection when you query external Avro,Parquet, ORC, Firestore export, or Datastore export files. These file formatsare self-describing, so BigQuery automatically infers the tableschema from the source data. For Parquet, Avro, and Orc files, you canoptionally provide an explicit schema to override the inferred schema.

Using the Google Cloud console, you can enable schema auto-detection bychecking theSchema and input parameters option forAuto detect.

Using the bq command-line tool, you can enable schema auto-detection when youcreate atable definition file for CSV,JSON, or Google Sheets data. When using the bq tool to create atable definition file, pass the--autodetect flag to themkdef command toenable schema auto-detection, or pass the--noautodetect flag to disableauto-detection.

When you use the--autodetect flag, theautodetect setting is set totruein the table definition file. When you use the--noautodetect flag, theautodetect setting is set tofalse. If you do not provide a schemadefinition for the external data source when you create a table definition, andyou do not use the--noautodetect or--autodetect flag, theautodetectsetting defaults totrue.

When you create a table definition file by using the API, set the value of theautodetect property totrue orfalse. Settingautodetect totrueenables auto-detection. Settingautodetect tofalse disables autodetect.

Auto-detection details

In addition to detecting schema details, auto-detection recognizes thefollowing:

Compression

BigQuery recognizes gzip-compatible file compression when openinga file.

Date and time values

BigQuery detects date and time values based on the formatting ofthe source data.

Values inDATE columns must be in the following format:YYYY-MM-DD.

Values inTIME columns must be in the following format:HH:MM:SS[.SSSSSS](the fractional-second component is optional).

ForTIMESTAMP columns, BigQuery detects a wide array oftimestamp formats, including, but not limited to:

  • YYYY-MM-DD HH:MM
  • YYYY-MM-DD HH:MM:SS
  • YYYY-MM-DD HH:MM:SS.SSSSSS
  • YYYY/MM/DD HH:MM

A timestamp can also contain a UTC offset or the UTC zone designator ('Z').

Here are some examples of values that BigQuery will automaticallydetect as timestamp values:

  • 2018-08-19 12:11
  • 2018-08-19 12:11:35.22
  • 2018/08/19 12:11
  • 2018-08-19 07:11:35.220 -05:00

If auto-detection isn't enabled, and your value is in a format not present in thepreceding examples, then BigQuery can only load the column as aSTRING data type. You can enable auto-detection to have BigQueryrecognize these columns as timestamps. For example, BigQuery willonly load2025-06-16T16:55:22Z as a timestamp if you enable auto-detection.

Alternatively, you can preprocess the source databefore loading it. For example, if you are exporting CSV data from aspreadsheet, set the date format to match one of the examples shown here.You can also transform the data after loading it intoBigQuery.

Schema auto-detection for CSV data

CSV delimiter

BigQuery detects the following delimiters:

  • comma ( , )
  • pipe ( | )
  • tab ( \t )

CSV header

BigQuery infers headers by comparing the first row of the filewith other rows in the file. If the first line contains only strings, and theother lines contain other data types, BigQuery assumes that thefirst row is a header row. BigQuery assigns column names based on the field names in the header row. The names might be modified to meet thenaming rules for columns in BigQuery. For example, spaces will be replaced with underscores.

Otherwise, BigQuery assumes the first row is a data row, andassigns generic column names such asstring_field_1. Note that after a tableis created, the column names cannot be updated in the schema, although you canchange the namesmanuallyafter the table is created. Another option is to provide an explicit schemainstead of using autodetect.

You might have a CSV file with a header row, where all of the data fields arestrings. In that case, BigQuery will not automatically detect thatthe first row is a header. Use the--skip_leading_rows option to skip theheader row. Otherwise, the header will be imported as data. Also considerproviding an explicit schema in this case, so that you can assign column names.

CSV quoted new lines

BigQuery detects quoted new line characters within a CSV fieldand does not interpret the quoted new line character as a row boundary.

Schema auto-detection for JSON data

JSON nested and repeated fields

BigQuery infers nested and repeated fields in JSON files. If afield value is a JSON object, then BigQuery loads the column as aRECORD type. If a field value is an array, then BigQuery loadsthe column as a repeated column. For an example of JSON data with nested andrepeated data, seeLoading nested and repeated JSON data.

String conversion

If you enable schema auto-detection, then BigQuery convertsstrings into Boolean, numeric, or date/time types when possible. For example,using the following JSON data, schema auto-detection converts theid fieldto anINTEGER column:

{"name":"Alice","id":"12"}{"name":"Bob","id":"34"}{"name":"Charles","id":"45"}

For more information, seeLoading JSON data from Cloud Storage.

Schema auto-detection for Google Sheets

For Sheets, BigQuery auto-detects whether thefirst row is a header row, similar to auto-detection for CSV files. If the firstline is identified as a header, BigQuery assigns column namesbased on the field names in the header row and skips the row. The names might bemodified to meet thenaming rules forcolumns inBigQuery. For example, spaces will be replaced with underscores.

Table security

To control access to tables in BigQuery, seeControl access to resources with IAM.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.