Loading JSON data from Cloud Storage

You can load newline-delimited JSON (ndJSON) data from Cloud Storage into a new tableor partition, or append to or overwrite an existing table or partition. Whenyour data is loaded into BigQuery, it is converted into columnar formatforCapacitor(BigQuery's storage format).

When you load data from Cloud Storage into a BigQuery table,the dataset that contains the table must be in the same regional or multi-regional location as the Cloud Storage bucket.

The ndJSON format is the same format as theJSON Lines format.

Limitations

You are subject to the following limitations when you load data intoBigQuery from a Cloud Storage bucket:

  • BigQuery does not guarantee data consistency for external datasources. Changes to the underlying data while a query is running can result inunexpected behavior.
  • BigQuery doesn't supportCloud Storage object versioning. If youinclude a generation number in the Cloud Storage URI, then the load jobfails.

When you load JSON files into BigQuery, note the following:

  • JSON data must be newline delimited, or ndJSON. Each JSON object must be on aseparate line in the file.
  • If you use gzipcompression,BigQuery cannot read the data in parallel. Loading compressedJSON data into BigQuery is slower than loading uncompresseddata.
  • You cannot include both compressed and uncompressed files in the same loadjob.
  • The maximum size for a gzip file is 4 GB.
  • BigQuery supports theJSON type even if schema information isnot known at the time of ingestion. A field that is declared asJSON typeis loaded with the raw JSON values.

  • If you use the BigQuery API to load an integer outside the rangeof [-253+1, 253-1] (usually this meanslarger than 9,007,199,254,740,991), into an integer (INT64)column, pass it as a string to avoid data corruption. This issue iscaused by a limitation on integer size in JSON or ECMAScript. For moreinformation, seethe Numbers section of RFC 7159.

  • When you load CSV or JSON data, values inDATE columns must use the dash (-) separator and the date must be in the following format:YYYY-MM-DD (year-month-day).
  • When you load JSON or CSV data, values inTIMESTAMP columnsmust use a dash (-) or slash (/) separator for the date portion of thetimestamp, and the date must be in one of the following formats:YYYY-MM-DD (year-month-day) orYYYY/MM/DD (year/month/day). Thehh:mm:ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.
  • Your files must meet the JSON file size limits described in theload jobs limits.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessarypermissions to perform each task in this document, and create a datasetto store your data.

Required permissions

To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.

Permissions to load data into BigQuery

To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:

  • bigquery.tables.create
  • bigquery.tables.updateData
  • bigquery.tables.update
  • bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:

  • roles/bigquery.dataEditor
  • roles/bigquery.dataOwner
  • roles/bigquery.admin (includes thebigquery.jobs.create permission)
  • bigquery.user (includes thebigquery.jobs.create permission)
  • bigquery.jobUser (includes thebigquery.jobs.create permission)

Additionally, if you have thebigquery.datasets.create permission, you can create andupdate tables using a load job in the datasets that you create.

For more information on IAM roles and permissions inBigQuery, seePredefined roles and permissions.

Permissions to load data from Cloud Storage

To get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you theStorage Admin (roles/storage.admin) IAM role on the bucket. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to load data from a Cloud Storage bucket:

  • storage.buckets.get
  • storage.objects.get
  • storage.objects.list (required if you are using a URIwildcard)

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Create a dataset

Create aBigQuery dataset to storeyour data.

JSON compression

You can use thegzip utility to compress JSON files. Note thatgzip performsfull file compression, unlike the file content compression performed bycompression codecs for other file formats, such as Avro. Usinggzip tocompress your JSON files might have a performance impact; for more informationabout the trade-offs, seeLoading compressed and uncompressed data.

Loading JSON data into a new table

To load JSON data from Cloud Storage into a new BigQuerytable:

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer.
  3. In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
  4. In theDataset info section, clickCreate table.
  5. In theCreate table pane, specify the following details:
    1. In theSource section, selectGoogle Cloud Storage in theCreate table from list. Then, do the following:
      1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, butwildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.select source file to create a BigQuery table
      2. ForFile format, selectJSONL (Newline delimited JSON).
    2. In theDestination section, specify the following details:
      1. ForDataset, select the dataset in which you want to create the table.
      2. In theTable field, enter the name of the table that you want to create.
      3. Verify that theTable type field is set toNative table.
    3. In theSchema section, enter theschema definition. To enable theauto detection of a schema, selectAuto detect. You can enter schema information manually by using one of the following methods:
      • Option 1: ClickEdit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process ascreating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
        bqshow--format=prettyjsondataset.table
      • Option 2: ClickAdd field and enter the table schema. Specify each field'sName,Type, andMode.
    4. Optional: SpecifyPartition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables.
    5. ClickAdvanced options and do the following:
      • ForWrite preference, leaveWrite if empty selected. This option creates a new table and loads your data into it.
      • ForNumber of errors allowed, accept the default value of0 or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in aninvalid message and fail. This option applies only to CSV and JSON files.
      • ForTime zone, enter the default time zone that will apply when parsing timestamp values that have no specific time zone. Checkhere for more valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. (Preview).
      • ForDate Format, enter the format elements that define how the DATE values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats. (Preview).
      • ForDatetime Format, enter the format elements that define how the DATETIME values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats. (Preview).
      • ForTime Format, enter the format elements that define how the TIME values are formatted in the input files. This field expects SQL styles format (for example,HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats. (Preview).
      • ForTimestamp Format, enter the format elements that define how the TIMESTAMP values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats. (Preview).
      • If you want to ignore values in a row that are not present in the table's schema, then selectUnknown values.
      • ForEncryption, clickCustomer-managed key to use aCloud Key Management Service key. If you leave theGoogle-managed key setting, BigQueryencrypts the data at rest.
    6. ClickCreate table.
Note: When you load data into an empty table by using theGoogle Cloud console, you cannot add a label, description, tableexpiration, or partition expiration.

After the table is created, you can update the table's expiration,description, and labels, but you cannot add a partition expiration after atable is created using the Google Cloud console. For more information, see
Managing tables.

SQL

Use theLOAD DATA DDL statement.The following example loads a JSON file into the new tablemytable:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    LOADDATAOVERWRITEmydataset.mytable(xINT64,ySTRING)FROMFILES(format='JSON',uris=['gs://bucket/path/file.json']);

  3. ClickRun.

For more information about how to run queries, seeRun an interactive query.

bq

Use thebq load command, specifyNEWLINE_DELIMITED_JSON using the--source_format flag, and include aCloud Storage URI.You can include a single URI, a comma-separated list of URIs, or a URIcontaining awildcard.Supply the schema inline, in a schema definition file, or useschema auto-detect.

(Optional) Supply the--location flag and set the value to yourlocation.

Other optional flags include:

  • --max_bad_records: An integer that specifies the maximum number of badrecords allowed before the entire job fails. The default value is0. Atmost, five errors of any type are returned regardless of the--max_bad_records value.
  • --ignore_unknown_values: When specified, allows and ignores extra,unrecognized values in CSV or JSON data.
  • --time_zone: (Preview) An optionaldefault time zone that will apply when parsing timestamp values that have nospecific time zone in CSV or JSON data.
  • --date_format: (Preview) An optionalcustom string that defines how the DATE values are formatted in CSV or JSONdata.
  • --datetime_format: (Preview) Anoptional custom string that defines how the DATETIME values are formatted inCSV or JSON data.
  • --time_format: (Preview) An optionalcustom string that defines how the TIME values are formatted in CSV or JSONdata.
  • --timestamp_format: (Preview) Anoptional custom string that defines how the TIMESTAMP values are formattedin CSV or JSON data.
  • --autodetect: When specified, enable schema auto-detection for CSV andJSON data.
  • --time_partitioning_type: Enables time-based partitioning on a table andsets the partition type. Possible values areHOUR,DAY,MONTH, andYEAR. This flag is optional when you create atable partitioned on aDATE,DATETIME, orTIMESTAMP column. Thedefault partition type for time-based partitioning isDAY. You cannotchange the partitioning specification on an existing table.
  • --time_partitioning_expiration: An integer that specifies (in seconds)when a time-based partition should be deleted. The expiration time evaluatesto the partition's UTC date plus the integer value.
  • --time_partitioning_field: TheDATE orTIMESTAMP column used tocreate a partitioned table. If time-based partitioning is enabled withoutthis value, an ingestion-time partitioned table is created.
  • --require_partition_filter: When enabled, this option requires usersto include aWHERE clause that specifies the partitions to query.Requiring a partition filter can reduce cost and improve performance.For more information, seeRequire a partition filter in queries.
  • --clustering_fields: A comma-separated list of up to four column namesused to create aclustered table.
  • --destination_kms_key: The Cloud KMS key for encryption of thetable data.

    For more information on partitioned tables, see:

    For more information on clustered tables, see:

    For more information on table encryption, see:

To load JSON data into BigQuery, enter the following command:

bq--location=LOCATIONload\--source_format=FORMAT\DATASET.TABLE\PATH_TO_SOURCE\SCHEMA

Replace the following:

  • LOCATION: your location. The--location flag isoptional. For example, if you are using BigQuery in theTokyo region, you can set the flag's value toasia-northeast1. You canset a default value for the location using the.bigqueryrc file.
  • FORMAT:NEWLINE_DELIMITED_JSON.
  • DATASET: an existing dataset.
  • TABLE: the name of the table into which you'reloading data.
  • PATH_TO_SOURCE: a fully qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
  • SCHEMA: a valid schema. The schema can be a localJSON file, or it can be typed inline as part of the command. If you use aschema file, do not give it an extension. You can alsouse the--autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data fromgs://mybucket/mydata.json into atable namedmytable inmydataset. The schema is defined in a localschema file namedmyschema.

bqload\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\gs://mybucket/mydata.json\./myschema

The following command loads data fromgs://mybucket/mydata.json into a newingestion-time partitioned table namedmytable inmydataset. The schemais defined in a local schema file namedmyschema.

bqload\--source_format=NEWLINE_DELIMITED_JSON\--time_partitioning_type=DAY\mydataset.mytable\gs://mybucket/mydata.json\./myschema

The following command loads data fromgs://mybucket/mydata.json into apartitioned table namedmytable inmydataset. The table is partitionedon themytimestamp column. The schema is defined in a local schema filenamedmyschema.

bqload\--source_format=NEWLINE_DELIMITED_JSON\--time_partitioning_fieldmytimestamp\mydataset.mytable\gs://mybucket/mydata.json\./myschema

The following command loads data fromgs://mybucket/mydata.json into atable namedmytable inmydataset. The schema is auto detected.

bqload\--autodetect\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\gs://mybucket/mydata.json

The following command loads data fromgs://mybucket/mydata.json into atable namedmytable inmydataset. The schema is defined inline in theformatFIELD:DATA_TYPE,FIELD:DATA_TYPE.

bqload\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\gs://mybucket/mydata.json\qtr:STRING,sales:FLOAT,year:STRING
Note: When you specify the schema using the bq tool, you cannotinclude aRECORD (STRUCT) type, you cannot include afield description, and you cannot specify the field mode. All fieldmodes default toNULLABLE. To include field descriptions, modes, andRECORD types, supply aJSON schema fileinstead.

The following command loads data from multiple files ings://mybucket/into a table namedmytable inmydataset. The Cloud Storage URI uses awildcard. The schema is auto detected.

bqload\--autodetect\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\gs://mybucket/mydata*.json

The following command loads data from multiple files ings://mybucket/into a table namedmytable inmydataset. The command includes a comma-separated list of Cloud Storage URIs with wildcards. The schema isdefined in a local schema file namedmyschema.

bqload\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\"gs://mybucket/00/*.json","gs://mybucket/01/*.json"\./myschema

API

  1. Create aload job that points to the source data in Cloud Storage.

  2. (Optional) Specify yourlocation inthelocation property in thejobReference section of thejob resource.

  3. Thesource URIs property must be fully qualified, in the formatgs://BUCKET/OBJECT.Each URI can contain one '*'wildcard character.

  4. Specify theJSON data format by setting thesourceFormat property toNEWLINE_DELIMITED_JSON.

  5. To check the job status, calljobs.get(JOB_ID*),replacingJOB_ID with the ID of the job returned bythe initial request.

    • Ifstatus.state = DONE, the job completed successfully.
    • If thestatus.errorResult property is present, the request failed,and that object includes information describing what went wrong.When a request fails, no table is created and no data is loaded.
    • Ifstatus.errorResult is absent, the job finished successfully;although, there might have been some nonfatal errors, such as problemsimporting a few rows. Nonfatal errors are listed in the returned jobobject'sstatus.errors property.

API notes:

  • Load jobs are atomic and consistent; if a load job fails, none of the datais available, and if a load job succeeds, all of the data is available.

  • As a best practice, generate a unique ID and pass it asjobReference.jobId when callingjobs.insert to create a load job. Thisapproach is more robust to network failure because the client can poll orretry on the known job ID.

  • Callingjobs.insert on a given job ID is idempotent. You can retry asmany times as you like on the same job ID, and at most, one of thoseoperations succeed.

C#

Before trying this sample, follow theC# setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryC# API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theBigQueryClient.CreateLoadJob()method to start a load jobfrom Cloud Storage. To use JSONL, create aCreateLoadJobOptionsobject and set itsSourceFormatproperty toFileFormat.NewlineDelimitedJson.

usingGoogle.Apis.Bigquery.v2.Data;usingGoogle.Cloud.BigQuery.V2;usingSystem;publicclassBigQueryLoadTableGcsJson{publicvoidLoadTableGcsJson(stringprojectId="your-project-id",stringdatasetId="your_dataset_id"){BigQueryClientclient=BigQueryClient.Create(projectId);vargcsURI="gs://cloud-samples-data/bigquery/us-states/us-states.json";vardataset=client.GetDataset(datasetId);varschema=newTableSchemaBuilder{{"name",BigQueryDbType.String},{"post_abbr",BigQueryDbType.String}}.Build();TableReferencedestinationTableRef=dataset.GetTableReference(tableId:"us_states");// Create job configurationvarjobOptions=newCreateLoadJobOptions(){SourceFormat=FileFormat.NewlineDelimitedJson};// Create and run jobBigQueryJobloadJob=client.CreateLoadJob(sourceUri:gcsURI,destination:destinationTableRef,schema:schema,options:jobOptions);loadJob=loadJob.PollUntilCompleted().ThrowOnAnyError();// Waits for the job to complete.// Display the number of rows uploadedBigQueryTabletable=client.GetTable(destinationTableRef);Console.WriteLine($"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");}}

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

import("context""fmt""cloud.google.com/go/bigquery")// importJSONExplicitSchema demonstrates loading newline-delimited JSON data from Cloud Storage// into a BigQuery table and providing an explicit schema for the data.funcimportJSONExplicitSchema(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.json")gcsRef.SourceFormat=bigquery.JSONgcsRef.Schema=bigquery.Schema{{Name:"name",Type:bigquery.StringFieldType},{Name:"post_abbr",Type:bigquery.StringFieldType},}loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theLoadJobConfiguration.builder(tableId, sourceUri)method to start a load job from Cloud Storage. To use newline-delimited JSON,use theLoadJobConfiguration.setFormatOptions(FormatOptions.json()).

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;// Sample to load JSON data from Cloud Storage into a new BigQuery tablepublicclassLoadJsonFromGCS{publicstaticvoidrunLoadJsonFromGCS(){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.json";Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING));loadJsonFromGCS(datasetName,tableName,sourceUri,schema);}publicstaticvoidloadJsonFromGCS(StringdatasetName,StringtableName,StringsourceUri,Schemaschema){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri).setFormatOptions(FormatOptions.json()).setSchema(schema).build();// Load data from a GCS JSON file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()){System.out.println("Json from GCS successfully loaded in a table");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/** * This sample loads the json file at * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.json * * TODO(developer): Replace the following lines with the path to your file. */constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.json';asyncfunctionloadJSONFromGCS(){// Imports a GCS file into a table with manually defined schema./**   * TODO(developer): Uncomment the following lines before running the sample.   */// const datasetId = "my_dataset";// const tableId = "my_table";// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'NEWLINE_DELIMITED_JSON',schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},],},location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id} completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors &&errors.length >0){throwerrors;}}

PHP

Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId  = 'The Google project ID';// $datasetId  = 'The BigQuery dataset ID';// instantiate the bigquery table service$bigQuery = new BigQueryClient([    'projectId' => $projectId,]);$dataset = $bigQuery->dataset($datasetId);$table = $dataset->table('us_states');// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.json';$schema = [    'fields' => [        ['name' => 'name', 'type' => 'string'],        ['name' => 'post_abbr', 'type' => 'string']    ]];$loadConfig = $table->loadFromStorage($gcsUri)->schema($schema)->sourceFormat('NEWLINE_DELIMITED_JSON');$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {    print('Waiting for job to complete' . PHP_EOL);    $job->reload();    if (!$job->isComplete()) {        throw new Exception('Job has not yet completed', 500);    }});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {    $error = $job->info()['status']['errorResult']['message'];    printf('Error running job: %s' . PHP_EOL, $error);} else {    print('Data imported successfully' . PHP_EOL);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theClient.load_table_from_uri()method to start a load job from Cloud Storage. To use JSONL,set theLoadJobConfig.source_formatpropertyto the stringNEWLINE_DELIMITED_JSON and pass the job config as thejob_config argument to theload_table_from_uri() method.
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),],source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,)uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"load_job=client.load_table_from_uri(uri,table_id,location="US",# Must match the destination dataset location.job_config=job_config,)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)print("Loaded{} rows.".format(destination_table.num_rows))

Ruby

Before trying this sample, follow theRuby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryRuby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theDataset.load_job()method to start a load job from Cloud Storage. To use JSONL,set theformat parameter to"json".

require"google/cloud/bigquery"defload_table_gcs_jsondataset_id="your_dataset_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idgcs_uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"table_id="us_states"load_job=dataset.load_jobtable_id,gcs_uri,format:"json"do|schema|schema.string"name"schema.string"post_abbr"endputs"Starting job#{load_job.job_id}"load_job.wait_until_done!# Waits for table load to complete.puts"Job finished."table=dataset.tabletable_idputs"Loaded#{table.rows_count} rows to table#{table.id}"end

Loading nested and repeated JSON data

BigQuery supports loading nested andrepeated data from source formats that support object-based schemas, such asJSON, Avro, ORC, Parquet, Firestore, and Datastore.

OneJSON object,including any nested or repeated fields, must appear on each line.

The following example shows sample nested or repeated data. This table containsinformation about people. It consists of the following fields:

  • id
  • first_name
  • last_name
  • dob (date of birth)
  • addresses (a nested and repeated field)
    • addresses.status (current or previous)
    • addresses.address
    • addresses.city
    • addresses.state
    • addresses.zip
    • addresses.numberOfYears (years at the address)

The JSON data file would look like the following. Notice that the address fieldcontains an array of values (indicated by[ ]).

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

The schema for this table would look like the following:

[{"name":"id","type":"STRING","mode":"NULLABLE"},{"name":"first_name","type":"STRING","mode":"NULLABLE"},{"name":"last_name","type":"STRING","mode":"NULLABLE"},{"name":"dob","type":"DATE","mode":"NULLABLE"},{"name":"addresses","type":"RECORD","mode":"REPEATED","fields":[{"name":"status","type":"STRING","mode":"NULLABLE"},{"name":"address","type":"STRING","mode":"NULLABLE"},{"name":"city","type":"STRING","mode":"NULLABLE"},{"name":"state","type":"STRING","mode":"NULLABLE"},{"name":"zip","type":"STRING","mode":"NULLABLE"},{"name":"numberOfYears","type":"STRING","mode":"NULLABLE"}]}]

For information on specifying a nested and repeated schema, seeSpecifying nested and repeated fields.

Loading semi-structured JSON data

BigQuery supports loading semi-structured data, in which a fieldcan take values of different types. The following example shows data similar tothe precedingnested and repeated JSON dataexample, except that theaddress field can be aSTRING, aSTRUCT, oranARRAY:

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","address":"123 First Avenue, Seattle WA 11111"}{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","address":{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"}}{"id":"3","first_name":"Bob","last_name":"Doe","dob":"1982-01-10","address":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"}, "321 Main Street Hoboken NJ 44444"]}

You can load this data into BigQuery by using the followingschema:

[{"name":"id","type":"STRING","mode":"NULLABLE"},{"name":"first_name","type":"STRING","mode":"NULLABLE"},{"name":"last_name","type":"STRING","mode":"NULLABLE"},{"name":"dob","type":"DATE","mode":"NULLABLE"},{"name":"address","type":"JSON","mode":"NULLABLE"}]

Theaddress field is loaded into a column with typeJSON that allowsit to holdthe mixed types in the example. You can ingest data asJSON whether itcontains mixed types or not. For example, you could specifyJSON instead ofSTRING as the type for thefirst_name field. For more information, seeWorking with JSON data in GoogleSQL.

Appending to or overwriting a table with JSON data

You can load additional data into a table either from source files or byappending query results.

In the Google Cloud console, use theWrite preference option to specifywhat action to take when you load data from a source file or from a queryresult.

You have the following options when you load additional data into a table:

Console optionbq tool flagBigQuery API propertyDescription
Write if emptyNot supportedWRITE_EMPTYWrites the data only if the table is empty.
Append to table--noreplace or--replace=false; if--[no]replace is unspecified, the default is appendWRITE_APPEND(Default) Appends the data to the end of the table.
Overwrite table--replace or--replace=trueWRITE_TRUNCATEErases all existing data in a table before writing the new data. This action also deletes the table schema, row level security, and removes any Cloud KMS key.

If you load data into an existing table, the load job can append the data oroverwrite the table.

You can append or overwrite a table by using one of the following:

  • The Google Cloud console
  • The bq command-line tool'sbq load command
  • Thejobs.insert API method and configuring aload job
  • The client libraries
Note: This page does not cover appending or overwriting partitioned tables. Forinformation on appending and overwriting partitioned tables, see:Appending to and overwriting partitioned table data.

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the left pane, clickExplorer.
  3. In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
  4. In theDataset info section, clickCreate table.
  5. In theCreate table pane, specify the following details:
    1. In theSource section, selectGoogle Cloud Storage in theCreate table from list. Then, do the following:
      1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, butwildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.select source file to create a BigQuery table
      2. ForFile format, selectJSONL (Newline delimited JSON).
    2. Note: It is possible to modify the table's schema when you append or overwrite it. For more information about supported schema changes during a load operation, seeModifying table schemas.
    3. In theDestination section, specify the following details:
      1. ForDataset, select the dataset in which you want to create the table.
      2. In theTable field, enter the name of the table that you want to create.
      3. Verify that theTable type field is set toNative table.
    4. In theSchema section, enter theschema definition. To enable theauto detection of a schema, selectAuto detect. You can enter schema information manually by using one of the following methods:
    5. Optional: SpecifyPartition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables. You cannot convert a table to a partitioned or clustered table by appending or overwriting it. The Google Cloud console does not support appending to or overwriting partitioned or clustered tables in a load job.
    6. ClickAdvanced options and do the following:
      • ForWrite preference, chooseAppend to table orOverwrite table.
      • ForNumber of errors allowed, accept the default value of0 or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in aninvalid message and fail. This option applies only to CSV and JSON files.
      • ForTime zone, enter the default time zone that will apply when parsing timestamp values that have no specific time zone. Checkhere for more valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. (Preview).
      • ForDate Format, enter the format elements that define how the DATE values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats. (Preview).
      • ForDatetime Format, enter the format elements that define how the DATETIME values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats. (Preview).
      • ForTime Format, enter the format elements that define how the TIME values are formatted in the input files. This field expects SQL styles format (for example,HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats. (Preview).
      • ForTimestamp Format, enter the format elements that define how the TIMESTAMP values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats. (Preview).
      • If you want to ignore values in a row that are not present in the table's schema, then selectUnknown values.
      • ForEncryption, clickCustomer-managed key to use aCloud Key Management Service key. If you leave theGoogle-managed key setting, BigQueryencrypts the data at rest.
    7. ClickCreate table.

SQL

Use theLOAD DATA DDL statement.The following example appends a JSON file to the tablemytable:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, enter the following statement:

    LOADDATAINTOmydataset.mytableFROMFILES(format='JSON',uris=['gs://bucket/path/file.json']);

  3. ClickRun.

For more information about how to run queries, seeRun an interactive query.

bq

Use thebq load command, specifyNEWLINE_DELIMITED_JSON using the--source_format flag, and include aCloud Storage URI.You can include a single URI, a comma-separated list of URIs, or a URIcontaining awildcard.

Supply the schema inline, in a schema definition file, or useschema auto-detect.

Specify the--replace flag to overwrite thetable. Use the--noreplace flag to append data to the table. If no flag isspecified, the default is to append data.

It is possible to modify the table's schema when you append oroverwrite it. For more information on supported schema changes during a loadoperation, seeModifying table schemas.

(Optional) Supply the--location flag and set the value to yourlocation.

Other optional flags include:

  • --max_bad_records: An integer that specifies the maximum number of badrecords allowed before the entire job fails. The default value is0. Atmost, five errors of any type are returned regardless of the--max_bad_records value.
  • --ignore_unknown_values: When specified, allows and ignores extra,unrecognized values in CSV or JSON data.
  • --time_zone: (Preview) An optionaldefault time zone that will apply when parsing timestamp values that have nospecific time zone in CSV or JSON data.
  • --date_format: (Preview) An optionalcustom string that defines how the DATE values are formatted in CSV or JSONdata.
  • --datetime_format: (Preview) Anoptional custom string that defines how the DATETIME values are formatted inCSV or JSON data.
  • --time_format: (Preview) An optionalcustom string that defines how the TIME values are formatted in CSV or JSONdata.
  • --timestamp_format: (Preview) Anoptional custom string that defines how the TIMESTAMP values are formattedin CSV or JSON data.
  • --autodetect: When specified, enable schema auto-detection for CSV andJSON data.
  • --destination_kms_key: The Cloud KMS key for encryption of thetable data.
bq--location=LOCATIONload\--[no]replace\--source_format=FORMAT\DATASET.TABLE\PATH_TO_SOURCE\SCHEMA

Replace the following:

  • LOCATION: yourlocation. The--location flag isoptional. You can set a default value for the location using the.bigqueryrc file.
  • FORMAT:NEWLINE_DELIMITED_JSON.
  • DATASET: an existing dataset.
  • TABLE: the name of the table into which you'reloading data.
  • PATH_TO_SOURCE: a fully qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
  • SCHEMA: a valid schema. The schema can be a localJSON file, or it can be typed inline as part of the command. You can alsouse the--autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data fromgs://mybucket/mydata.json andoverwrites a table namedmytable inmydataset. The schema is definedusingschema auto-detection.

bqload\--autodetect\--replace\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\gs://mybucket/mydata.json

The following command loads data fromgs://mybucket/mydata.json andappends data to a table namedmytable inmydataset. The schema isdefined using a JSON schema file —myschema.

bqload\--noreplace\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\gs://mybucket/mydata.json\./myschema

API

  1. Create aload job that points to the source data in Cloud Storage.

  2. (Optional) Specify yourlocation inthelocation property in thejobReference section of thejob resource.

  3. Thesource URIs propertymust be fully-qualified, in the formatgs://BUCKET/OBJECT. You caninclude multiple URIs as a comma-separated list. Thewildcards arealso supported.

  4. Specify the data format by setting theconfiguration.load.sourceFormat property toNEWLINE_DELIMITED_JSON.

  5. Specify the write preference by setting theconfiguration.load.writeDisposition property toWRITE_TRUNCATE orWRITE_APPEND.

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

import("context""fmt""cloud.google.com/go/bigquery")// importJSONTruncate demonstrates loading data from newline-delimeted JSON data in Cloud Storage// and overwriting/truncating data in the existing table.funcimportJSONTruncate(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.json")gcsRef.SourceFormat=bigquery.JSONgcsRef.AutoDetect=trueloader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteTruncatejob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}

Java

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;// Sample to overwrite the BigQuery table data by loading a JSON file from GCSpublicclassLoadJsonFromGCSTruncate{publicstaticvoidrunLoadJsonFromGCSTruncate(){// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.json";Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING));loadJsonFromGCSTruncate(datasetName,tableName,sourceUri,schema);}publicstaticvoidloadJsonFromGCSTruncate(StringdatasetName,StringtableName,StringsourceUri,Schemaschema){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri).setFormatOptions(FormatOptions.json())// Set the write disposition to overwrite existing table data.setWriteDisposition(JobInfo.WriteDisposition.WRITE_TRUNCATE).setSchema(schema).build();// Load data from a GCS JSON file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()){System.out.println("Table is successfully overwritten by JSON file loaded from GCS");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/** * This sample loads the JSON file at * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.json * * TODO(developer): Replace the following lines with the path to your file. */constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.json';asyncfunctionloadJSONFromGCSTruncate(){/**   * Imports a GCS file into a table and overwrites   * table data if table already exists.   *//**   * TODO(developer): Uncomment the following lines before running the sample.   */// const datasetId = "my_dataset";// const tableId = "my_table";// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'NEWLINE_DELIMITED_JSON',schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},],},// Set the write disposition to overwrite existing table data.writeDisposition:'WRITE_TRUNCATE',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id} completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors &&errors.length >0){throwerrors;}}

PHP

Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId = 'The Google project ID';// $datasetId = 'The BigQuery dataset ID';// $tableID = 'The BigQuery table ID';// instantiate the bigquery table service$bigQuery = new BigQueryClient([    'projectId' => $projectId,]);$table = $bigQuery->dataset($datasetId)->table($tableId);// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.json';$loadConfig = $table->loadFromStorage($gcsUri)->sourceFormat('NEWLINE_DELIMITED_JSON')->writeDisposition('WRITE_TRUNCATE');$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {    print('Waiting for job to complete' . PHP_EOL);    $job->reload();    if (!$job->isComplete()) {        throw new Exception('Job has not yet completed', 500);    }});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {    $error = $job->info()['status']['errorResult']['message'];    printf('Error running job: %s' . PHP_EOL, $error);} else {    print('Data imported successfully' . PHP_EOL);}

Python

To replace the rows in an existing table, set theLoadJobConfig.write_disposition propertyto the stringWRITE_TRUNCATE.

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importiofromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_namejob_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),],)body=io.BytesIO(b"Washington,WA")client.load_table_from_file(body,table_id,job_config=job_config).result()previous_rows=client.get_table(table_id).num_rowsassertprevious_rows >0job_config=bigquery.LoadJobConfig(write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,)uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)print("Loaded{} rows.".format(destination_table.num_rows))

Ruby

To replace the rows in an existing table, set thewrite parameter ofTable.load_job()to"WRITE_TRUNCATE".

Before trying this sample, follow theRuby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryRuby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

require"google/cloud/bigquery"defload_table_gcs_json_truncatedataset_id="your_dataset_id",table_id="your_table_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idgcs_uri="gs://cloud-samples-data/bigquery/us-states/us-states.json"load_job=dataset.load_jobtable_id,gcs_uri,format:"json",write:"truncate"puts"Starting job#{load_job.job_id}"load_job.wait_until_done!# Waits for table load to complete.puts"Job finished."table=dataset.tabletable_idputs"Loaded#{table.rows_count} rows to table#{table.id}"end

Loading hive-partitioned JSON data

BigQuery supports loading hive partitioned JSON data stored onCloud Storage and populates the hive partitioning columns as columns inthe destination BigQuery managed table. For more information, seeLoading externally partitioned data.

Details of loading JSON data

This section describes how BigQuery parses various data types whenloading JSON data.

Data types

Boolean. BigQuery can parse any of the following pairs forBoolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all caseinsensitive). Schemaautodetectionautomatically detects any of these except 0 and 1.

Bytes. Columns with BYTES types must be encoded as Base64.

Date. Columns with DATE types must be in the formatYYYY-MM-DD.

Datetime. Columns with DATETIME types must be in the formatYYYY-MM-DDHH:MM:SS[.SSSSSS].

Geography. Columns with GEOGRAPHY types must contain strings in one of thefollowing formats:

  • Well-known text (WKT)
  • Well-known binary (WKB)
  • GeoJSON

If you use WKB, the value should be hex encoded.

The following list shows examples of valid data:

  • WKT:POINT(1 2)
  • GeoJSON:{ "type": "Point", "coordinates": [1, 2] }
  • Hex encoded WKB:0101000000feffffffffffef3f0000000000000040

Before loading GEOGRAPHY data, also readLoading geospatial data.

Interval. Columns with INTERVAL types must be inISO 8601 formatPYMDTHMS, where:

  • P = Designator that indicates that the value represents a duration. You mustalways include this.
  • Y = Year
  • M = Month
  • D = Day
  • T = Designator that denotes the time portion of the duration. You mustalways include this.
  • H = Hour
  • M = Minute
  • S = Second. Seconds can be denoted as a whole value or as a fractional valueof up to six digits, at microsecond precision.

You can indicate a negative value by prepending a dash (-).

The following list shows examples of valid data:

  • P-10000Y0M-3660000DT-87840000H0M0S
  • P0Y0M0DT0H0M0.000001S
  • P10000Y0M3660000DT87840000H0M0S

To load INTERVAL data, you must use thebq load command and use the--schemaflag to specify a schema. You can't upload INTERVAL data by using the console.

Time. Columns with TIME types must be in the formatHH:MM:SS[.SSSSSS].

Timestamp. BigQuery accepts various timestamp formats.The timestamp must include a date portion and a time portion.

  • The date portion can be formatted asYYYY-MM-DD orYYYY/MM/DD.

  • The timestamp portion must be formatted asHH:MM[:SS[.SSSSSS]] (seconds andfractions of seconds are optional).

  • The date and time must be separated by a space or 'T'.

  • Optionally, the date and time can be followed by a UTC offset or the UTC zonedesignator (Z). For more information, seeTime zones.

For example, any of the following are valid timestamp values:

  • 2018-08-19 12:11
  • 2018-08-19 12:11:35
  • 2018-08-19 12:11:35.22
  • 2018/08/19 12:11
  • 2018-07-05 12:54:00 UTC
  • 2018-08-19 07:11:35.220 -05:00
  • 2018-08-19T12:11:35.220Z

If you provide a schema, BigQuery also accepts Unix epoch time fortimestamp values. However, schema autodetection doesn't detect this case, andtreats the value as a numeric or string type instead.

Examples of Unix epoch timestamp values:

  • 1534680695
  • 1.534680695e12

Array (repeated field). The value must be a JSON array ornull. JSONnull is converted to SQLNULL. The array itself cannot containnullvalues.

Schema auto-detection

This section describes the behavior ofschema auto-detection when loading JSON files.

JSON nested and repeated fields

BigQuery infers nested and repeated fields in JSON files. If afield value is a JSON object, then BigQuery loads the column as aRECORD type. If a field value is an array, then BigQuery loadsthe column as a repeated column. For an example of JSON data with nested andrepeated data, seeLoading nested and repeated JSON data.

String conversion

If you enable schema auto-detection, then BigQuery convertsstrings into Boolean, numeric, or date/time types when possible. For example,using the following JSON data, schema auto-detection converts theid fieldto anINTEGER column:

{"name":"Alice","id":"12"}{"name":"Bob","id":"34"}{"name":"Charles","id":"45"}

Encoding types

BigQuery expects JSON data to be UTF-8 encoded. If you haveJSON files with other supported encoding types, you should explicitly specifythe encoding by using the--encoding flag so thatBigQuery converts the data to UTF-8.

BigQuery supports the following encoding types for JSON files:

  • UTF-8
  • ISO-8859-1
  • UTF-16BE (UTF-16 Big Endian)
  • UTF-16LE (UTF-16 Little Endian)
  • UTF-32BE (UTF-32 Big Endian)
  • UTF-32LE (UTF-32 Little Endian)

JSON options

To change how BigQuery parses JSON data, specify additionaloptions in the Google Cloud console, the bq command-line tool, the API, or the clientlibraries.

JSON optionConsole optionbq tool flagBigQuery API propertyDescription
Number of bad records allowedNumber of errors allowed--max_bad_recordsmaxBadRecords (Java,Python)(Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is `0`, which requires that all records are valid.
Unknown valuesIgnore unknown values--ignore_unknown_valuesignoreUnknownValues (Java,Python)(Optional) Indicates whether BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. The `sourceFormat` property determines what BigQuery treats as an extra value: CSV: trailing columns, JSON: named values that don't match any column names.
EncodingNone-E or--encodingencoding (Python)(Optional) The character encoding of the data. The supported values are UTF-8, ISO-8859-1, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. The default value is UTF-8.
Time ZoneTime Zone--time_zoneNone(Preview) (Optional) Default time zone that is applied when parsing timestamp values that have no specific time zone. Checkvalid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC.
Date FormatDate Format--date_formatNone(Preview) (Optional)Format elements that define how the DATE values are formatted in the input files (for example,MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats.
Datetime FormatDatetime Format--datetime_formatNone(Preview) (Optional)Format elements that define how the DATETIME values are formatted in the input files (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats.
Time FormatTime Format--time_formatNone(Preview) (Optional)Format elements that define how the TIME values are formatted in the input files (for example,HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats.
Timestamp FormatTimestamp Format--timestamp_formatNone(Preview) (Optional)Format elements that define how the TIMESTAMP values are formatted in the input files (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.