Batch loading data
You can load data into BigQuery from Cloud Storage or from a localfile as a batch operation. The source data can be in any of the followingformats:
- Avro
- Comma-separated values (CSV)
- JSON (newline-delimited)
- ORC
- Parquet
- Datastore exports stored in Cloud Storage
- Firestore exports stored in Cloud Storage
You can also useBigQuery Data Transfer Service to setup recurring loads from Cloud Storage into BigQuery.
Try it for yourself
If you're new to Google Cloud, create an account to evaluate how BigQuery performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
Try BigQuery freeBefore you begin
Grant Identity and Access Management (IAM) roles that give users the necessarypermissions to perform each task in this document, and create a datasetto store your data.
Required permissions
To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.
Permissions to load data into BigQuery
To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:
bigquery.tables.createbigquery.tables.updateDatabigquery.tables.updatebigquery.jobs.create
Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:
roles/bigquery.dataEditorroles/bigquery.dataOwnerroles/bigquery.admin(includes thebigquery.jobs.createpermission)bigquery.user(includes thebigquery.jobs.createpermission)bigquery.jobUser(includes thebigquery.jobs.createpermission)
Additionally, if you have thebigquery.datasets.create permission, you can create andupdate tables using a load job in the datasets that you create.
For more information on IAM roles and permissions inBigQuery, seePredefined roles and permissions.
Permissions to load data from Cloud Storage
To get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you theStorage Admin (roles/storage.admin) IAM role on the bucket. For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to load data from a Cloud Storage bucket:
storage.buckets.getstorage.objects.getstorage.objects.list (required if you are using a URIwildcard)
You might also be able to get these permissions withcustom roles or otherpredefined roles.
Create a dataset
Create aBigQuery dataset to storeyour data.
Loading data from Cloud Storage
BigQuery supports loading data from any of the followingCloud Storagestorage classes:
- Standard
- Nearline
- Coldline
- Archive
To learn how to load data into BigQuery, see the page for yourdata format:
To learn how to configure a recurring load from Cloud Storage intoBigQuery, seeCloud Storage transfers.
Location considerations
You cannot change the location of a dataset after it is created, but you canmake a copy of the dataset or manually move it. For more information, see:
Retrieving the Cloud Storage URI
To load data from a Cloud Storage data source, you must provide theCloud Storage URI.
The Cloud Storage resource path contains your bucket name and yourobject (filename). For example, if the Cloud Storage bucket is namedmybucket and the data file is namedmyfile.csv, the resource path would begs://mybucket/myfile.csv.
BigQuery does not support Cloud Storage resource pathsthat include multiple consecutive slashes after the initial double slash.Cloud Storage object names can contain multiple consecutive slash ("/")characters. However, BigQuery converts multiple consecutiveslashes into a single slash. For example, the following resource path, thoughvalid in Cloud Storage, does not work in BigQuery:gs://bucket/my//object//name.
To retrieve the Cloud Storage resource path:
Open the Cloud Storage console.
Browse to the location of the object (file) that contains the source data.
Click on the name of the object.
TheObject details page opens.
Copy the value provided in thegsutil URI field, which begins with
gs://.
gcloud storage ls command to list bucketsor objects.For Google Datastore exports, only one URI can be specified, and itmust end with.backup_info or.export_metadata.
Wildcard support for Cloud Storage URIs
If your data is separated into multiple files, you can use an asterisk (*)wildcard to select multiple files. Use of the asterisk wildcard must followthese rules:
- The asterisk can appear inside the object name or at the end of the objectname.
- Using multiple asterisks is unsupported. For example, the path
gs://mybucket/fed-*/temp/*.csvis invalid. - Using an asterisk with the bucket name is unsupported.
Examples:
The following example shows how to select all of the files in all thefolders which start with the prefix
gs://mybucket/fed-samples/fed-sample:gs://mybucket/fed-samples/fed-sample*The following example shows how to select only files with a
.csvextensionin the folder namedfed-samplesand any subfolders offed-samples:gs://mybucket/fed-samples/*.csvThe following example shows how to select files with a naming pattern of
fed-sample*.csvin the folder namedfed-samples. This example doesn'tselect files in subfolders offed-samples.gs://mybucket/fed-samples/fed-sample*.csv
When using the bq command-line tool, you might need to escape the asterisk on someplatforms.
You can't use an asterisk wildcard when you load Datastore orFirestore export data from Cloud Storage.
Limitations
You are subject to the following limitations when you load data intoBigQuery from a Cloud Storage bucket:
- BigQuery does not guarantee data consistency for external datasources. Changes to the underlying data while a query is running can result inunexpected behavior.
- BigQuery doesn't supportCloud Storage object versioning. If youinclude a generation number in the Cloud Storage URI, then the load jobfails.
Depending on the format of your Cloud Storage source data, there may beadditional limitations. For more information, see:
- CSV limitations
- JSON limitations
- Datastore export limitations
- Firestore export limitations
- Limitations on nested and repeated data
Loading data from local files
You can load data from a readable data source (such as your local machine) byusing one of the following:
- The Google Cloud console
- The bq command-line tool's
bq loadcommand - The API
- The client libraries
When you load data using the Google Cloud console or the bq command-line tool, a loadjob is automatically created.
To load data from a local data source:
Console
Open the BigQuery page in the Google Cloud console.
In the left pane, clickExplorer:

If you don't see the left pane, clickExpand left pane to open the pane.
In theExplorer pane, expand your project, clickDatasets, andthen select a dataset.
In the details pane, clickCreate table.
On theCreate table page, in theSource section:
- ForCreate table from, selectUpload.
- ForSelect file, clickBrowse.
- Browse to the file, and clickOpen. Note that wildcardsand comma-separated lists are not supported for local files.
- ForFile format, selectCSV,JSON (newline delimited),Avro,Parquet, orORC.
On theCreate table page, in theDestination section:
- ForProject, choose the appropriate project.
- ForDataset, choose the appropriate dataset.
- In theTable field, enter the name of the table you'recreating in BigQuery.
- Verify thatTable type is set toNative table.
In theSchema section, enter theschemadefinition.
For CSV and JSON files, you can check theAuto-detect option toenable schemaauto-detect. Schemainformation is self-described in the source data for other supportedfile types.
You can also enter schema information manually by:
ClickingEdit as text and entering the table schema as a JSONarray:
Note: You can view the schema of an existing table in JSONformat by entering the following command:bq show --format=prettyjsondataset.table.UsingAdd Field to manually input the schema.
Select applicable items in theAdvanced options section. For information on the available options, seeCSV optionsandJSON options.
Optional: In theAdvanced options choose the write disposition:
- Write if empty: Write the data only if the table is empty.
- Append to table: Append the data to the end of the table. Thissetting is the default.
- Overwrite table: Erase all existing data in the table beforewriting the new data.
ClickCreate Table.
bq
Use thebq load command, specify thesource_format, and include the pathto the local file.
(Optional) Supply the--location flag and set the value to yourlocation.
If you are loading data in a project other than your default project, addthe project ID to the dataset in the following format:PROJECT_ID:DATASET.
bq--location=LOCATIONload\--source_format=FORMAT\PROJECT_ID:DATASET.TABLE\PATH_TO_SOURCE\SCHEMA
Replace the following:
LOCATION: your location. The--locationflag isoptional. For example, if you are using BigQuery in theTokyo region, set the flag's value toasia-northeast1. You can set adefault value for the location by using the.bigqueryrc file.FORMAT:CSV,AVRO,PARQUET,ORC, orNEWLINE_DELIMITED_JSON.project_id: your project ID.dataset: an existing dataset.table: the name of the table into which you'reloading data.path_to_source: the path to the local file.schema: a valid schema. The schema can be a localJSON file, or it can be typed inline as part of the command. You can alsouse the--autodetectflag instead of supplying a schema definition.
In addition, you can add flags for options that let you control howBigQuery parses your data. For example, you can use the--skip_leading_rows flag to ignore header rows in a CSV file. For moreinformation, seeCSV optionsandJSON options.
Examples:
The following command loads a local newline-delimited JSON file(mydata.json) into a table namedmytable inmydataset in your defaultproject. The schema is defined in a local schema file namedmyschema.json.
bqload\--source_format=NEWLINE_DELIMITED_JSON\mydataset.mytable\./mydata.json\./myschema.jsonThe following command loads a local CSV file (mydata.csv) into a tablenamedmytable inmydataset inmyotherproject. The schema is definedinline in the formatFIELD:DATA_TYPE,FIELD:DATA_TYPE.
bqload\--source_format=CSV\myotherproject:mydataset.mytable\./mydata.csv\qtr:STRING,sales:FLOAT,year:STRINGRECORD (STRUCT) type,you cannot include a field description, and you cannot specify thefield mode. All field modes default toNULLABLE. To include fielddescriptions, modes, andRECORD types, supply aJSON schema file instead.The following command loads a local CSV file (mydata.csv) into a tablenamedmytable inmydataset in your default project. The schema isdefined usingschema auto-detection.
bqload\--autodetect\--source_format=CSV\mydataset.mytable\./mydata.csvC#
Before trying this sample, follow theC# setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryC# API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.UploadCsvOptions.
usingGoogle.Cloud.BigQuery.V2;usingSystem;usingSystem.IO;publicclassBigQueryLoadFromFile{publicvoidLoadFromFile(stringprojectId="your-project-id",stringdatasetId="your_dataset_id",stringtableId="your_table_id",stringfilePath="path/to/file.csv"){BigQueryClientclient=BigQueryClient.Create(projectId);// Create job configurationvaruploadCsvOptions=newUploadCsvOptions(){SkipLeadingRows=1,// Skips the file headersAutodetect=true};using(FileStreamstream=File.Open(filePath,FileMode.Open)){// Create and run job// Note that there are methods available for formats other than CSVBigQueryJobjob=client.UploadCsv(datasetId,tableId,null,stream,uploadCsvOptions);job=job.PollUntilCompleted().ThrowOnAnyError();// Waits for the job to complete.// Display the number of rows uploadedBigQueryTabletable=client.GetTable(datasetId,tableId);Console.WriteLine($"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");}}}Go
Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.NewReaderSource to the appropriate format.
import("context""fmt""os""cloud.google.com/go/bigquery")// importCSVFromFile demonstrates loading data into a BigQuery table using a file on the local filesystem.funcimportCSVFromFile(projectID,datasetID,tableID,filenamestring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()f,err:=os.Open(filename)iferr!=nil{returnerr}source:=bigquery.NewReaderSource(f)source.AutoDetect=true// Allow BigQuery to determine schema.source.SkipLeadingRows=1// CSV has a single header line.loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(source)job,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}iferr:=status.Err();err!=nil{returnerr}returnnil}Java
Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
TableIdtableId=TableId.of(datasetName,tableName);WriteChannelConfigurationwriteChannelConfiguration=WriteChannelConfiguration.newBuilder(tableId).setFormatOptions(FormatOptions.csv()).build();// The location must be specified; other fields can be auto-detected.JobIdjobId=JobId.newBuilder().setLocation(location).build();TableDataWriteChannelwriter=bigquery.writer(jobId,writeChannelConfiguration);// Write data to writertry(OutputStreamstream=Channels.newOutputStream(writer)){Files.copy(csvPath,stream);}// Get load jobJobjob=writer.getJob();job=job.waitFor();LoadStatisticsstats=job.getStatistics();returnstats.getOutputRows();Node.js
Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.metadata parameter of theloadfunction to the appropriate format.
// Imports the Google Cloud client libraryconst{BigQuery}=require('@google-cloud/bigquery');constbigquery=newBigQuery();asyncfunctionloadLocalFile(){// Imports a local file into a table./** * TODO(developer): Uncomment the following lines before running the sample. */// const filename = '/path/to/file.csv';// const datasetId = 'my_dataset';// const tableId = 'my_table';// Load data from a local file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(filename);console.log(`Job${job.id} completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors &&errors.length >0){throwerrors;}}PHP
Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId = 'The Google project ID';// $datasetId = 'The BigQuery dataset ID';// $tableId = 'The BigQuery table ID';// $source = 'The path to the CSV source file to import';// instantiate the bigquery table service$bigQuery = new BigQueryClient([ 'projectId' => $projectId,]);$dataset = $bigQuery->dataset($datasetId);$table = $dataset->table($tableId);// create the import job$loadConfig = $table->load(fopen($source, 'r'))->sourceFormat('CSV');$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) { printf('Waiting for job to complete' . PHP_EOL); $job->reload(); if (!$job->isComplete()) { throw new Exception('Job has not yet completed', 500); }});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) { $error = $job->info()['status']['errorResult']['message']; printf('Error running job: %s' . PHP_EOL, $error);} else { print('Data imported successfully' . PHP_EOL);}Python
Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.
fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(source_format=bigquery.SourceFormat.CSV,skip_leading_rows=1,autodetect=True,)withopen(file_path,"rb")assource_file:job=client.load_table_from_file(source_file,table_id,job_config=job_config)job.result()# Waits for the job to complete.table=client.get_table(table_id)# Make an API request.print("Loaded{} rows and{} columns to{}".format(table.num_rows,len(table.schema),table_id))Ruby
Before trying this sample, follow theRuby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryRuby API reference documentation. To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.format parameter of theTable#load_jobmethod to the appropriate format.
require"google/cloud/bigquery"defload_from_filedataset_id="your_dataset_id",file_path="path/to/file.csv"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idtable_id="new_table_id"# Infer the config.location based on the location of the referenced dataset.load_job=dataset.load_jobtable_id,file_pathdo|config|config.skip_leading=1config.autodetect=trueendload_job.wait_until_done!# Waits for table load to complete.table=dataset.tabletable_idputs"Loaded#{table.rows_count} rows into#{table.id}"endLimitations
Loading data from a local data source is subject to the following limitations:
- Wildcards and comma-separated lists are not supported when you load files froma local data source. Files must be loaded individually.
- When using the Google Cloud console, files loaded from a local data sourcecannot exceed 100 MB. For larger files, load the file from Cloud Storage.
Load job capacity
Similar to the on-demand mode for queries, load jobs by default use a sharedpool of slots. BigQuery doesn't guarantee the available capacityof this shared pool or load job throughput.
To increase throughput or predictably control the capacity of your load jobs,you can create aslot reservationand assign dedicatedPIPELINE slots to run load jobs. For more information, seeReservation assignments.
Loading compressed and uncompressed data
For Avro, Parquet, and ORC formats, BigQuery supportsloading files where the file data has been compressed using asupported codec. However, BigQuery doesn't support loading filesin these formats that have themselves been compressed, for example by usingthegzip utility.
The Avro binary format is the preferred format for loading both compressed anduncompressed data. Avro data is faster to load because the data can be read inparallel, even when the data blocks are compressed.For a list of supported compression codecs, seeAvro compression.
Parquet binary format is also a good choice because Parquet's efficient,per-column encoding typically results in a better compression ratio and smallerfiles. Parquet files also leverage compression techniques that allow files to beloaded in parallel. For a list of supported compression codecs, seeParquet compression.
The ORC binary format offers benefits similar to the benefits of the Parquetformat. Data in ORC files is fast to load because data stripes can be read inparallel. The rows in each data stripe are loaded sequentially. To optimize loadtime, use a data stripe size of approximately 256 MB or less.For a list of supported compression codecs, seeORC compression.
For other data formats such as CSV and JSON, BigQuery can loaduncompressed files significantly faster than compressed files becauseuncompressed files can be read in parallel. Because uncompressed files arelarger, using them can lead to bandwidth limitations and higher Cloud Storagecosts for data staged in Cloud Storage prior to being loaded intoBigQuery. Keep in mind that line ordering isn'tguaranteed for compressed or uncompressed files. It's important to weigh thesetradeoffs depending on your use case.
In general, if bandwidth is limited, compress your CSV and JSON files by usinggzip before uploading them to Cloud Storage.gzip is the onlysupported file compression type for CSV and JSON files when loading data intoBigQuery. If loading speed is important to your app and you havea lot of bandwidth to load your data, leave your files uncompressed.
Appending to or overwriting a table
You can load additional data into a table either from source files or byappending query results. If theschema of the data does not match the schema of the destination table orpartition, you can update the schema when you append to it or overwrite it.
If you update the schema when appending data, BigQuery allowsyou to:
- Add new fields
- Relax
REQUIREDfields toNULLABLE
If you are overwriting a table, the schema is always overwritten. Schema updatesare not restricted when you overwrite a table.
In the Google Cloud console, use theWrite preference option to specifywhat action to take when you load data from a source file or from a queryresult. The bq command-line tool and the API include the following options:
| Console option | bq tool flag | BigQuery API property | Description |
|---|---|---|---|
| Write if empty | None | WRITE_EMPTY | Writes the data only if the table is empty. |
| Append to table | --noreplace or--replace=false; if--replace is unspecified, the default is append | WRITE_APPEND | (Default) Appends the data to the end of the table. |
| Overwrite table | --replace or--replace=true | WRITE_TRUNCATE | Erases all existing data in a table before writing the new data. |
Quota policy
For information about the quota policy for batch loading data, seeLoad jobs on the Quotas and limits page.
View current quota usage
You can view your current usage of query, load, extract, or copy jobs by runninganINFORMATION_SCHEMA query to view metadata about the jobs ran over aspecified time period. You can compare your current usage against thequotalimit to determine your quota usage for aparticular type of job. The following example query uses theINFORMATION_SCHEMA.JOBS view to list the number of query, load, extract, andcopy jobs by project:
SELECTsum(casewhenjob_type="QUERY"then1else0end)asQRY_CNT,sum(casewhenjob_type="LOAD"then1else0end)asLOAD_CNT,sum(casewhenjob_type="EXTRACT"then1else0end)asEXT_CNT,sum(casewhenjob_type="COPY"then1else0end)asCPY_CNTFROM`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS_BY_PROJECTWHEREdate(creation_time)=CURRENT_DATE()
Pricing
There is no charge for batch loading data intoBigQuery using the shared slot pool. For more information, seeBigQuery data ingestion pricing.
Example use case
Suppose there is a nightly batch processing pipeline that needs to becompleted by a fixed deadline. Data needs to be available by this deadline forfurther processing by another batch process to generate reports to be sent to aregulator. This use case is common in regulated industries such as finance.
Batch loading of data with load jobsis the right approach for this use case because latency is not a concernprovided the deadline can be met. Ensure your Cloud Storage bucketsmeet the location requirementsfor loading data into the BigQuery dataset.
The result of a BigQuery load job is atomic; either all recordsget inserted or none do. As a best practice, when inserting all data in a singleload job, create a new table by using theWRITE_TRUNCATE disposition oftheJobConfigurationLoadresource.This is important when retrying a failed load job, as the client mightnot be able to distinguish between jobs that have failed and the failurecaused by for example in communicating the success state back to the client.
Assuming data to be ingested has been successfully copied toCloud Storage already, retrying with exponential backoff is sufficientto address ingestion failures.
It's recommended that a nightly batch job doesn't hit thedefault quota of 1,500 loads per table per day even with retries. When loading dataincrementally, the default quota is sufficient for running a load job every 5minutes and have unconsumed quota for at least 1 retry per job on average.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.