Batch loading data

You can load data into BigQuery from Cloud Storage or from a localfile as a batch operation. The source data can be in any of the followingformats:

  • Avro
  • Comma-separated values (CSV)
  • JSON (newline-delimited)
  • ORC
  • Parquet
  • Datastore exports stored in Cloud Storage
  • Firestore exports stored in Cloud Storage

You can also useBigQuery Data Transfer Service to setup recurring loads from Cloud Storage into BigQuery.

Try it for yourself

If you're new to Google Cloud, create an account to evaluate how BigQuery performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Try BigQuery free

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessarypermissions to perform each task in this document, and create a datasetto store your data.

Required permissions

To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.

Permissions to load data into BigQuery

To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:

  • bigquery.tables.create
  • bigquery.tables.updateData
  • bigquery.tables.update
  • bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:

  • roles/bigquery.dataEditor
  • roles/bigquery.dataOwner
  • roles/bigquery.admin (includes thebigquery.jobs.create permission)
  • bigquery.user (includes thebigquery.jobs.create permission)
  • bigquery.jobUser (includes thebigquery.jobs.create permission)

Additionally, if you have thebigquery.datasets.create permission, you can create andupdate tables using a load job in the datasets that you create.

For more information on IAM roles and permissions inBigQuery, seePredefined roles and permissions.

Permissions to load data from Cloud Storage

To get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you theStorage Admin (roles/storage.admin) IAM role on the bucket. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to load data from a Cloud Storage bucket:

  • storage.buckets.get
  • storage.objects.get
  • storage.objects.list (required if you are using a URIwildcard)

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Create a dataset

Create aBigQuery dataset to storeyour data.

Loading data from Cloud Storage

BigQuery supports loading data from any of the followingCloud Storagestorage classes:

  • Standard
  • Nearline
  • Coldline
  • Archive

To learn how to load data into BigQuery, see the page for yourdata format:

To learn how to configure a recurring load from Cloud Storage intoBigQuery, seeCloud Storage transfers.

Location considerations

You cannot change the location of a dataset after it is created, but you canmake a copy of the dataset or manually move it. For more information, see:

Retrieving the Cloud Storage URI

To load data from a Cloud Storage data source, you must provide theCloud Storage URI.

The Cloud Storage resource path contains your bucket name and yourobject (filename). For example, if the Cloud Storage bucket is namedmybucket and the data file is namedmyfile.csv, the resource path would begs://mybucket/myfile.csv.

BigQuery does not support Cloud Storage resource pathsthat include multiple consecutive slashes after the initial double slash.Cloud Storage object names can contain multiple consecutive slash ("/")characters. However, BigQuery converts multiple consecutiveslashes into a single slash. For example, the following resource path, thoughvalid in Cloud Storage, does not work in BigQuery:gs://bucket/my//object//name.

To retrieve the Cloud Storage resource path:

  1. Open the Cloud Storage console.

    Cloud Storage console

  2. Browse to the location of the object (file) that contains the source data.

  3. Click on the name of the object.

    TheObject details page opens.

  4. Copy the value provided in thegsutil URI field, which begins withgs://.

Note: You can also use thegcloud storage ls command to list bucketsor objects.

For Google Datastore exports, only one URI can be specified, and itmust end with.backup_info or.export_metadata.

Wildcard support for Cloud Storage URIs

If your data is separated into multiple files, you can use an asterisk (*)wildcard to select multiple files. Use of the asterisk wildcard must followthese rules:

  • The asterisk can appear inside the object name or at the end of the objectname.
  • Using multiple asterisks is unsupported. For example, the pathgs://mybucket/fed-*/temp/*.csv is invalid.
  • Using an asterisk with the bucket name is unsupported.

Examples:

  • The following example shows how to select all of the files in all thefolders which start with the prefixgs://mybucket/fed-samples/fed-sample:

    gs://mybucket/fed-samples/fed-sample*
  • The following example shows how to select only files with a.csv extensionin the folder namedfed-samples and any subfolders offed-samples:

    gs://mybucket/fed-samples/*.csv
  • The following example shows how to select files with a naming pattern offed-sample*.csv in the folder namedfed-samples. This example doesn'tselect files in subfolders offed-samples.

    gs://mybucket/fed-samples/fed-sample*.csv

When using the bq command-line tool, you might need to escape the asterisk on someplatforms.

You can't use an asterisk wildcard when you load Datastore orFirestore export data from Cloud Storage.

Limitations

You are subject to the following limitations when you load data intoBigQuery from a Cloud Storage bucket:

  • BigQuery does not guarantee data consistency for external datasources. Changes to the underlying data while a query is running can result inunexpected behavior.
  • BigQuery doesn't supportCloud Storage object versioning. If youinclude a generation number in the Cloud Storage URI, then the load jobfails.

Depending on the format of your Cloud Storage source data, there may beadditional limitations. For more information, see:

Loading data from local files

You can load data from a readable data source (such as your local machine) byusing one of the following:

  • The Google Cloud console
  • The bq command-line tool'sbq load command
  • The API
  • The client libraries

When you load data using the Google Cloud console or the bq command-line tool, a loadjob is automatically created.

To load data from a local data source:

Console

  1. Open the BigQuery page in the Google Cloud console.

    Go to the BigQuery page

  2. In the left pane, clickExplorer:

    Highlighted button for the Explorer pane.

    If you don't see the left pane, clickExpand left pane to open the pane.

  3. In theExplorer pane, expand your project, clickDatasets, andthen select a dataset.

  4. In the details pane, clickCreate table.

  5. On theCreate table page, in theSource section:

    • ForCreate table from, selectUpload.
    • ForSelect file, clickBrowse.
    • Browse to the file, and clickOpen. Note that wildcardsand comma-separated lists are not supported for local files.
    • ForFile format, selectCSV,JSON (newline delimited),Avro,Parquet, orORC.
  6. On theCreate table page, in theDestination section:

    • ForProject, choose the appropriate project.
    • ForDataset, choose the appropriate dataset.
    • In theTable field, enter the name of the table you'recreating in BigQuery.
    • Verify thatTable type is set toNative table.
  7. In theSchema section, enter theschemadefinition.

  8. Select applicable items in theAdvanced options section. For information on the available options, seeCSV optionsandJSON options.

  9. Optional: In theAdvanced options choose the write disposition:

    • Write if empty: Write the data only if the table is empty.
    • Append to table: Append the data to the end of the table. Thissetting is the default.
    • Overwrite table: Erase all existing data in the table beforewriting the new data.
  10. ClickCreate Table.

bq

Use thebq load command, specify thesource_format, and include the pathto the local file.

(Optional) Supply the--location flag and set the value to yourlocation.

If you are loading data in a project other than your default project, addthe project ID to the dataset in the following format:PROJECT_ID:DATASET.

bq--location=LOCATIONload\--source_format=FORMAT\PROJECT_ID:DATASET.TABLE\PATH_TO_SOURCE\SCHEMA

Replace the following:

  • LOCATION: your location. The--location flag isoptional. For example, if you are using BigQuery in theTokyo region, set the flag's value toasia-northeast1. You can set adefault value for the location by using the.bigqueryrc file.
  • FORMAT:CSV,AVRO,PARQUET,ORC, orNEWLINE_DELIMITED_JSON.
  • project_id: your project ID.
  • dataset: an existing dataset.
  • table: the name of the table into which you'reloading data.
  • path_to_source: the path to the local file.
  • schema: a valid schema. The schema can be a localJSON file, or it can be typed inline as part of the command. You can alsouse the--autodetect flag instead of supplying a schema definition.

In addition, you can add flags for options that let you control howBigQuery parses your data. For example, you can use the--skip_leading_rows flag to ignore header rows in a CSV file. For moreinformation, seeCSV optionsandJSON options.

Examples:

The following command loads a local newline-delimited JSON file(mydata.json) into a table namedmytable inmydataset in your defaultproject. The schema is defined in a local schema file namedmyschema.json.

bqload\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\./mydata.json\./myschema.json

The following command loads a local CSV file (mydata.csv) into a tablenamedmytable inmydataset inmyotherproject. The schema is definedinline in the formatFIELD:DATA_TYPE,FIELD:DATA_TYPE.

bqload\--source_format=CSV\myotherproject:mydataset.mytable\./mydata.csv\qtr:STRING,sales:FLOAT,year:STRING
Note: When you specify the schema on the commandline, you cannot include aRECORD (STRUCT) type,you cannot include a field description, and you cannot specify thefield mode. All field modes default toNULLABLE. To include fielddescriptions, modes, andRECORD types, supply aJSON schema file instead.

The following command loads a local CSV file (mydata.csv) into a tablenamedmytable inmydataset in your default project. The schema isdefined usingschema auto-detection.

bqload\--autodetect\--source_format=CSV\mydataset.mytable\./mydata.csv

C#

Before trying this sample, follow theC# setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryC# API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The following code demonstrates how to load a local CSV file to a newBigQuery table. To load a local file of another format, usethe update options class for the appropriate format from theJobCreationOptionsbase class instead ofUploadCsvOptions.

usingGoogle.Cloud.BigQuery.V2;usingSystem;usingSystem.IO;publicclassBigQueryLoadFromFile{publicvoidLoadFromFile(stringprojectId="your-project-id",stringdatasetId="your_dataset_id",stringtableId="your_table_id",stringfilePath="path/to/file.csv"){BigQueryClientclient=BigQueryClient.Create(projectId);// Create job configurationvaruploadCsvOptions=newUploadCsvOptions(){SkipLeadingRows=1,// Skips the file headersAutodetect=true};using(FileStreamstream=File.Open(filePath,FileMode.Open)){// Create and run job// Note that there are methods available for formats other than CSVBigQueryJobjob=client.UploadCsv(datasetId,tableId,null,stream,uploadCsvOptions);job=job.PollUntilCompleted().ThrowOnAnyError();// Waits for the job to complete.// Display the number of rows uploadedBigQueryTabletable=client.GetTable(datasetId,tableId);Console.WriteLine($"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");}}}

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The following code demonstrates how to load a local CSV file to a newBigQuery table. To load a local file of another format, settheDataFormatproperty of theNewReaderSource to the appropriate format.

import("context""fmt""os""cloud.google.com/go/bigquery")// importCSVFromFile demonstrates loading data into a BigQuery table using a file on the local filesystem.funcimportCSVFromFile(projectID,datasetID,tableID,filenamestring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()f,err:=os.Open(filename)iferr!=nil{returnerr}source:=bigquery.NewReaderSource(f)source.AutoDetect=true// Allow BigQuery to determine schema.source.SkipLeadingRows=1// CSV has a single header line.loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(source)job,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}iferr:=status.Err();err!=nil{returnerr}returnnil}

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The following code demonstrates how to load a local CSV file to a newBigQuery table. To load a local file of another format, settheFormatOptionsto the appropriate format.

TableIdtableId=TableId.of(datasetName,tableName);WriteChannelConfigurationwriteChannelConfiguration=WriteChannelConfiguration.newBuilder(tableId).setFormatOptions(FormatOptions.csv()).build();// The location must be specified; other fields can be auto-detected.JobIdjobId=JobId.newBuilder().setLocation(location).build();TableDataWriteChannelwriter=bigquery.writer(jobId,writeChannelConfiguration);// Write data to writertry(OutputStreamstream=Channels.newOutputStream(writer)){Files.copy(csvPath,stream);}// Get load jobJobjob=writer.getJob();job=job.waitFor();LoadStatisticsstats=job.getStatistics();returnstats.getOutputRows();

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The following code demonstrates how to load a local CSV file to a newBigQuery table. To load a local file of another format,set themetadata parameter of theloadfunction to the appropriate format.

// Imports the Google Cloud client libraryconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctionloadLocalFile(){// Imports a local file into a table./**   * TODO(developer): Uncomment the following lines before running the sample.   */// const filename = '/path/to/file.csv';// const datasetId = 'my_dataset';// const tableId = 'my_table';// Load data from a local file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(filename);console.log(`Job${job.id} completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors &&errors.length >0){throwerrors;}}

PHP

Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The following code demonstrates how to load a local CSV file to a newBigQuery table. To load a local file of another format, setthesourceFormatto the appropriate format.

use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId  = 'The Google project ID';// $datasetId  = 'The BigQuery dataset ID';// $tableId    = 'The BigQuery table ID';// $source     = 'The path to the CSV source file to import';// instantiate the bigquery table service$bigQuery = new BigQueryClient([    'projectId' => $projectId,]);$dataset = $bigQuery->dataset($datasetId);$table = $dataset->table($tableId);// create the import job$loadConfig = $table->load(fopen($source, 'r'))->sourceFormat('CSV');$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {    printf('Waiting for job to complete' . PHP_EOL);    $job->reload();    if (!$job->isComplete()) {        throw new Exception('Job has not yet completed', 500);    }});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {    $error = $job->info()['status']['errorResult']['message'];    printf('Error running job: %s' . PHP_EOL, $error);} else {    print('Data imported successfully' . PHP_EOL);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The following code demonstrates how to load a local CSV file to a newBigQuery table. To load a local file of another format, settheLoadJobConfig.source_formatpropertyto the appropriate format.

fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.CSV,skip_leading_rows=1,autodetect=True,)withopen(file_path,"rb")assource_file:job=client.load_table_from_file(source_file,table_id,job_config=job_config)job.result()# Waits for the job to complete.table=client.get_table(table_id)# Make an API request.print("Loaded{} rows and{} columns to{}".format(table.num_rows,len(table.schema),table_id))

Ruby

Before trying this sample, follow theRuby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryRuby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

The following code demonstrates how to load a local CSV file to a newBigQuery table. To load a local file of another format,set theformat parameter of theTable#load_jobmethod to the appropriate format.

require"google/cloud/bigquery"defload_from_filedataset_id="your_dataset_id",file_path="path/to/file.csv"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idtable_id="new_table_id"# Infer the config.location based on the location of the referenced dataset.load_job=dataset.load_jobtable_id,file_pathdo|config|config.skip_leading=1config.autodetect=trueendload_job.wait_until_done!# Waits for table load to complete.table=dataset.tabletable_idputs"Loaded#{table.rows_count} rows into#{table.id}"end

Limitations

Loading data from a local data source is subject to the following limitations:

  • Wildcards and comma-separated lists are not supported when you load files froma local data source. Files must be loaded individually.
  • When using the Google Cloud console, files loaded from a local data sourcecannot exceed 100 MB. For larger files, load the file from Cloud Storage.

Load job capacity

Similar to the on-demand mode for queries, load jobs by default use a sharedpool of slots. BigQuery doesn't guarantee the available capacityof this shared pool or load job throughput.

To increase throughput or predictably control the capacity of your load jobs,you can create aslot reservationand assign dedicatedPIPELINE slots to run load jobs. For more information, seeReservation assignments.

Loading compressed and uncompressed data

For Avro, Parquet, and ORC formats, BigQuery supportsloading files where the file data has been compressed using asupported codec. However, BigQuery doesn't support loading filesin these formats that have themselves been compressed, for example by usingthegzip utility.

The Avro binary format is the preferred format for loading both compressed anduncompressed data. Avro data is faster to load because the data can be read inparallel, even when the data blocks are compressed.For a list of supported compression codecs, seeAvro compression.

Parquet binary format is also a good choice because Parquet's efficient,per-column encoding typically results in a better compression ratio and smallerfiles. Parquet files also leverage compression techniques that allow files to beloaded in parallel. For a list of supported compression codecs, seeParquet compression.

The ORC binary format offers benefits similar to the benefits of the Parquetformat. Data in ORC files is fast to load because data stripes can be read inparallel. The rows in each data stripe are loaded sequentially. To optimize loadtime, use a data stripe size of approximately 256 MB or less.For a list of supported compression codecs, seeORC compression.

For other data formats such as CSV and JSON, BigQuery can loaduncompressed files significantly faster than compressed files becauseuncompressed files can be read in parallel. Because uncompressed files arelarger, using them can lead to bandwidth limitations and higher Cloud Storagecosts for data staged in Cloud Storage prior to being loaded intoBigQuery. Keep in mind that line ordering isn'tguaranteed for compressed or uncompressed files. It's important to weigh thesetradeoffs depending on your use case.

In general, if bandwidth is limited, compress your CSV and JSON files by usinggzip before uploading them to Cloud Storage.gzip is the onlysupported file compression type for CSV and JSON files when loading data intoBigQuery. If loading speed is important to your app and you havea lot of bandwidth to load your data, leave your files uncompressed.

Appending to or overwriting a table

You can load additional data into a table either from source files or byappending query results. If theschema of the data does not match the schema of the destination table orpartition, you can update the schema when you append to it or overwrite it.

If you update the schema when appending data, BigQuery allowsyou to:

  • Add new fields
  • RelaxREQUIRED fields toNULLABLE

If you are overwriting a table, the schema is always overwritten. Schema updatesare not restricted when you overwrite a table.

In the Google Cloud console, use theWrite preference option to specifywhat action to take when you load data from a source file or from a queryresult. The bq command-line tool and the API include the following options:

Console optionbq tool flagBigQuery API propertyDescription
Write if emptyNoneWRITE_EMPTYWrites the data only if the table is empty.
Append to table--noreplace or--replace=false; if--replace is unspecified, the default is appendWRITE_APPEND(Default) Appends the data to the end of the table.
Overwrite table--replace or--replace=trueWRITE_TRUNCATEErases all existing data in a table before writing the new data.

Quota policy

For information about the quota policy for batch loading data, seeLoad jobs on the Quotas and limits page.

View current quota usage

You can view your current usage of query, load, extract, or copy jobs by runninganINFORMATION_SCHEMA query to view metadata about the jobs ran over aspecified time period. You can compare your current usage against thequotalimit to determine your quota usage for aparticular type of job. The following example query uses theINFORMATION_SCHEMA.JOBS view to list the number of query, load, extract, andcopy jobs by project:

SELECTsum(casewhenjob_type="QUERY"then1else0end)asQRY_CNT,sum(casewhenjob_type="LOAD"then1else0end)asLOAD_CNT,sum(casewhenjob_type="EXTRACT"then1else0end)asEXT_CNT,sum(casewhenjob_type="COPY"then1else0end)asCPY_CNTFROM`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS_BY_PROJECTWHEREdate(creation_time)=CURRENT_DATE()

Pricing

There is no charge for batch loading data intoBigQuery using the shared slot pool. For more information, seeBigQuery data ingestion pricing.

Example use case

Suppose there is a nightly batch processing pipeline that needs to becompleted by a fixed deadline. Data needs to be available by this deadline forfurther processing by another batch process to generate reports to be sent to aregulator. This use case is common in regulated industries such as finance.

Batch loading of data with load jobsis the right approach for this use case because latency is not a concernprovided the deadline can be met. Ensure your Cloud Storage bucketsmeet the location requirementsfor loading data into the BigQuery dataset.

The result of a BigQuery load job is atomic; either all recordsget inserted or none do. As a best practice, when inserting all data in a singleload job, create a new table by using theWRITE_TRUNCATE disposition oftheJobConfigurationLoadresource.This is important when retrying a failed load job, as the client mightnot be able to distinguish between jobs that have failed and the failurecaused by for example in communicating the success state back to the client.

Assuming data to be ingested has been successfully copied toCloud Storage already, retrying with exponential backoff is sufficientto address ingestion failures.

It's recommended that a nightly batch job doesn't hit thedefault quota of 1,500 loads per table per day even with retries. When loading dataincrementally, the default quota is sufficient for running a load job every 5minutes and have unconsumed quota for at least 1 retry per job on average.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.