Loading CSV data from Cloud Storage

When you load CSV data from Cloud Storage, you can load the data into a newtable or partition, or you can append to or overwrite an existing table orpartition. When your data is loaded into BigQuery, it isconverted into columnar format forCapacitor(BigQuery's storage format).

When you load data from Cloud Storage into a BigQuery table,the dataset that contains the table must be in the same regional or multi-regional location as the Cloud Storage bucket.

For information about loading CSV data from a local file, seeLoading data into BigQuery from a local data source.

Try it for yourself

If you're new to Google Cloud, create an account to evaluate how BigQuery performs in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Try BigQuery free

Limitations

You are subject to the following limitations when you load data intoBigQuery from a Cloud Storage bucket:

BigQuery does not guarantee data consistency for external datasources. Changes to the underlying data while a query is running can result inunexpected behavior.
BigQuery doesn't supportCloud Storage object versioning. If youinclude a generation number in the Cloud Storage URI, then the load jobfails.

When you load CSV files into BigQuery, note the following:

CSV files don't support nested or repeated data.
Remove byte order mark (BOM) characters. They might cause unexpected issues.
If you use gzip compression,BigQuery cannot read the data in parallel. Loading compressedCSV data into BigQuery is slower than loading uncompressed data.SeeLoading compressed and uncompressed data.
You cannot include both compressed and uncompressed files in the same loadjob.
The maximum size for a gzip file is 4 GB.
Loading CSV datausing schema autodetectiondoes not automatically detect headers if all of the columns are string types.In this case, add a numerical column to the input or declare the schemaexplicitly.
When you load CSV or JSON data, values inDATE columns must use the dash (-) separator and the date must be in the following format:YYYY-MM-DD (year-month-day).
When you load JSON or CSV data, values inTIMESTAMP columnsmust use a dash (-) or slash (/) separator for the date portion of thetimestamp, and the date must be in one of the following formats:YYYY-MM-DD (year-month-day) orYYYY/MM/DD (year/month/day). Thehh:mm:ss (hour-minute-second) portion of the timestamp must use a colon (:) separator.
Your files must meet the CSV file size limits described in theload jobs limits.

Before you begin

Grant Identity and Access Management (IAM) roles that give users the necessarypermissions to perform each task in this document, and create a datasetto store your data.

Required permissions

To load data into BigQuery, you need IAM permissions to run a load job and load data into BigQuery tables and partitions. If you are loading data from Cloud Storage, you also need IAM permissions to access the bucket that contains your data.

Permissions to load data into BigQuery

To load data into a new BigQuery table or partition or to append or overwrite an existing table or partition, you need the following IAM permissions:

bigquery.tables.create
bigquery.tables.updateData
bigquery.tables.update
bigquery.jobs.create

Each of the following predefined IAM roles includes the permissions that you need in order to load data into a BigQuery table or partition:

roles/bigquery.dataEditor
roles/bigquery.dataOwner
roles/bigquery.admin (includes thebigquery.jobs.create permission)
bigquery.user (includes thebigquery.jobs.create permission)
bigquery.jobUser (includes thebigquery.jobs.create permission)

Additionally, if you have thebigquery.datasets.create permission, you can create andupdate tables using a load job in the datasets that you create.

For more information on IAM roles and permissions inBigQuery, seePredefined roles and permissions.

Permissions to load data from Cloud Storage

To get the permissions that you need to load data from a Cloud Storage bucket, ask your administrator to grant you theStorage Admin (roles/storage.admin) IAM role on the bucket. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to load data from a Cloud Storage bucket. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to load data from a Cloud Storage bucket:

storage.buckets.get
storage.objects.get
storage.objects.list (required if you are using a URIwildcard)

You might also be able to get these permissions withcustom roles or otherpredefined roles.

Create a dataset

Create aBigQuery dataset to storeyour data.

CSV compression

You can use thegzip utility to compress CSV files. Note thatgzip performsfull file compression, unlike the file content compression performed bycompression codecs for other file formats, such as Avro. Usinggzip tocompress your CSV files might have a performance impact; for more informationabout the trade-offs, seeLoading compressed and uncompressed data.

Loading CSV data into a table

To load CSV data from Cloud Storage into a new BigQuerytable, select one of the following options:

Console

To follow step-by-step guidance for this task directly in the Cloud Shell Editor, clickGuide me:

Guide me

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery
In the left pane, clickExplorer.
In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
In theDataset info section, clickCreate table.
In theCreate table pane, specify the following details:

In theSource section, selectGoogle Cloud Storage in theCreate table from list. Then, do the following:
1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, butwildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
2. ForFile format, selectCSV.
In theDestination section, specify the following details:
1. ForDataset, select the dataset in which you want to create the table.
2. In theTable field, enter the name of the table that you want to create.
3. Verify that theTable type field is set toNative table.
In theSchema section, enter theschema definition. To enable theauto detection of a schema, selectAuto detect. You can enter schema information manually by using one of the following methods:
- Option 1: ClickEdit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process ascreating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
```
bqshow--format=prettyjsondataset.table
```
- Option 2: ClickAdd field and enter the table schema. Specify each field'sName,Type, andMode.
Optional: SpecifyPartition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables.
ClickAdvanced options and do the following:
- ForWrite preference, leaveWrite if empty selected. This option creates a new table and loads your data into it.
- ForNumber of errors allowed, accept the default value of0 or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in aninvalid message and fail. This option applies only to CSV and JSON files.
- ForTime zone, enter the default time zone that will apply when parsing timestamp values that have no specific time zone. Checkhere for more valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. (Preview).
- ForDate Format, enter the format elements that define how the DATE values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats. (Preview).
- ForDatetime Format, enter the format elements that define how the DATETIME values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats. (Preview).
- ForTime Format, enter the format elements that define how the TIME values are formatted in the input files. This field expects SQL styles format (for example,HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats. (Preview).
- ForTimestamp Format, enter the format elements that define how the TIMESTAMP values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats. (Preview).
- If you want to ignore values in a row that are not present in the table's schema, then selectUnknown values.
- ForField delimiter, choose the character that separates the cells in your CSV file:Comma,Tab,Pipe, orCustom. If you chooseCustom, enter the delimiter in theCustom field delimiter box. The default value isComma.
- ForSource column match, choose one of the following strategies used to match the loaded columns to the schema (Preview).
- ForHeader rows to skip, enter the number of header rows to skip at the top of the CSV file. The default value is0.
- ForQuoted newlines, checkAllow quoted newlines to allow quoted data sections that contain newline characters in a CSV file. The default value isfalse.
- ForJagged rows, checkAllow jagged rows to accept rows in CSV files that are missing trailing optional columns. The missing values are treated as nulls. If unchecked, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value isfalse.
- ForNull markers, enter a list of custom strings that represents a NULL value in CSV data. (Preview).
- ForEncryption, clickCustomer-managed key to use aCloud Key Management Service key. If you leave theGoogle-managed key setting, BigQueryencrypts the data at rest.
ClickCreate table.

Note: When you load data into an empty table by using theGoogle Cloud console, you cannot add a label, description, tableexpiration, or partition expiration.

After the table is created, you can update the table's expiration,description, and labels, but you cannot add a partition expiration after atable is created using the Google Cloud console. For more information, see Managing tables.

SQL

Use theLOAD DATA DDL statement.The following example loads a CSV file into the new tablemytable:

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery

In the query editor, enter the following statement:

LOADDATAOVERWRITEmydataset.mytable(xINT64,ySTRING)FROMFILES(format='CSV',uris=['gs://bucket/path/file.csv']);

ClickRun.

For more information about how to run queries, seeRun an interactive query.

bq

Use thebq load command, specifyCSV using the--source_format flag, and include aCloud Storage URI.You can include a single URI, a comma-separated list of URIs, or a URIcontaining awildcard.Supply the schema inline, in a schema definition file, or useschema auto-detect. If you don't specify aschema, and--autodetect isfalse, and the destinationtable exists, then the schema of the destination table is used.

(Optional) Supply the--location flag and set the value to yourlocation.

Other optional flags include:

--allow_jagged_rows: When specified, accept rows in CSV files that aremissing trailing optional columns. The missing values are treated as nulls.If unchecked, records with missing trailing columns are treated as badrecords, and if there are too many bad records, an invalid error is returnedin the job result. The default value isfalse.
--allow_quoted_newlines: When specified, allows quoted data sectionsthat contain newline characters in a CSV file. The default value isfalse.
--field_delimiter: The character that indicates the boundary betweencolumns in the data. Both\t andtab are allowed for tab delimiters.The default value is,.
--null_marker: An optional custom string that represents a NULL value inCSV data.
--null_markers: (Preview) An optionalcomma-separated list of custom strings that represent NULL values in CSVdata. This option cannot be used with--null_marker flag.
--source_column_match: (Preview)Specifies the strategy used to match loaded columns to the schema. You canspecifyPOSITION to match loaded columns by position, assuming that thecolumns are ordered the same way as the schema. You can also specifyNAMEto match by name by reading the header row as the column names andreordering columns to match the field names in the schema. If this value isunspecified, then the default is based on how the schema is provided. If--autodetect is enabled, then the default behavior is to match columns byname. Otherwise, the default is to match columns by position.
--skip_leading_rows: Specifies the number of header rows to skipat the top of the CSV file. The default value is0.
--quote: The quote character to use to enclose records. The defaultvalue is". To indicate no quote character, use an empty string.
--max_bad_records: An integer that specifies the maximum number of badrecords allowed before the entire job fails. The default value is0. Atmost, five errors of any type are returned regardless of the--max_bad_records value.
--ignore_unknown_values: When specified, allows and ignores extra,unrecognized values in CSV or JSON data.
--time_zone: (Preview) An optionaldefault time zone that will apply when parsing timestamp values that have nospecific time zone in CSV or JSON data.
--date_format: (Preview) An optionalcustom string that defines how the DATE values are formatted in CSV or JSONdata.
--datetime_format: (Preview) Anoptional custom string that defines how the DATETIME values are formatted inCSV or JSON data.
--time_format: (Preview) An optionalcustom string that defines how the TIME values are formatted in CSV or JSONdata.
--timestamp_format: (Preview) Anoptional custom string that defines how the TIMESTAMP values are formattedin CSV or JSON data.
--autodetect: When specified, enable schema auto-detection for CSV andJSON data.
--time_partitioning_type: Enables time-based partitioning on a table andsets the partition type. Possible values areHOUR,DAY,MONTH, andYEAR. This flag is optional when you create atable partitioned on aDATE,DATETIME, orTIMESTAMP column. Thedefault partition type for time-based partitioning isDAY. You cannotchange the partitioning specification on an existing table.
--time_partitioning_expiration: An integer that specifies (in seconds)when a time-based partition should be deleted. The expiration time evaluatesto the partition's UTC date plus the integer value.
--time_partitioning_field: TheDATE orTIMESTAMP column used tocreate a partitioned table. If time-based partitioning is enabled withoutthis value, an ingestion-time partitioned table is created.
--require_partition_filter: When enabled, this option requires usersto include aWHERE clause that specifies the partitions to query.Requiring a partition filter may reduce cost and improve performance.For more information, seeQuerying partitioned tables.
--clustering_fields: A comma-separated list of up to four column namesused to create aclustered table.
--destination_kms_key: The Cloud KMS key for encryption of thetable data.
--column_name_character_map: Defines the scope and handling ofcharacters in column names, with the option of enablingflexible column names.Requires the--autodetect option for CSV files.For more information, seeload_option_list.
For more information on thebq load command, see:
- Command-line reference
For more information on partitioned tables, see:
- Creating partitioned tables
For more information on clustered tables, see:
- Creating and using clustered tables
For more information on table encryption, see:
- Protecting data with Cloud KMS keys

To load CSV data into BigQuery, enter the following command:

bq--location=locationload\--source_format=format\dataset.table\path_to_source\schema

Where:

location is your location. The--location flag is optional.For example, if you are using BigQuery in the Tokyo region,you can set the flag's value toasia-northeast1. You can set a defaultvalue for the location using the.bigqueryrc file.
format isCSV.
dataset is an existing dataset.
table is the name of the table into which you're loading data.
path_to_source is a fully-qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
schema is a valid schema. The schema can be a local JSON file,or it can be typed inline as part of the command. You can also use the--autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data fromgs://mybucket/mydata.csv into atable namedmytable inmydataset. The schema is defined in a localschema file namedmyschema.json.

bqload\--source_format=CSV\mydataset.mytable\gs://mybucket/mydata.csv\./myschema.json

The following command loads data fromgs://mybucket/mydata.csv into atable namedmytable inmydataset. The schema is defined in a localschema file namedmyschema.json. The CSV file includes two header rows.If--skip_leading_rows is unspecified, the default behavior is to assumethe file does not contain headers.

bqload\--source_format=CSV\--skip_leading_rows=2mydataset.mytable\gs://mybucket/mydata.csv\./myschema.json

The following command loads data fromgs://mybucket/mydata.csv into aningestion-time partitioned table namedmytable inmydataset. The schemais defined in a local schema file namedmyschema.json.

bqload\--source_format=CSV\--time_partitioning_type=DAY\mydataset.mytable\gs://mybucket/mydata.csv\./myschema.json

The following command loads data fromgs://mybucket/mydata.csv into a newpartitioned table namedmytable inmydataset. The table is partitionedon themytimestamp column. The schema is defined in a local schema filenamedmyschema.json.

bqload\--source_format=CSV\--time_partitioning_fieldmytimestamp\mydataset.mytable\gs://mybucket/mydata.csv\./myschema.json

The following command loads data fromgs://mybucket/mydata.csv into atable namedmytable inmydataset. The schema is auto detected.

bqload\--autodetect\--source_format=CSV\mydataset.mytable\gs://mybucket/mydata.csv

The following command loads data fromgs://mybucket/mydata.csv into atable namedmytable inmydataset. The schema is defined inline in theformatfield:data_type,field:data_type.

bqload\--source_format=CSV\mydataset.mytable\gs://mybucket/mydata.csv\qtr:STRING,sales:FLOAT,year:STRING

Note: When you specify the schema using the bq command-line tool, you cannotinclude aRECORD (STRUCT) type, you cannot include afield description, and you cannot specify the field mode. All fieldmodes default toNULLABLE. To include field descriptions, modes, andRECORD types, supply aJSON schema fileinstead.

The following command loads data from multiple files ings://mybucket/into a table namedmytable inmydataset. The Cloud Storage URI uses awildcard. The schema is auto detected.

bqload\--autodetect\--source_format=CSV\mydataset.mytable\gs://mybucket/mydata*.csv

The following command loads data from multiple files ings://mybucket/into a table namedmytable inmydataset. The command includes a comma-separated list of Cloud Storage URIs with wildcards. The schema isdefined in a local schema file namedmyschema.json.

bqload\--source_format=CSV\mydataset.mytable\"gs://mybucket/00/*.csv","gs://mybucket/01/*.csv"\./myschema.json

API

Create aload job that points to the source data in Cloud Storage.
(Optional) Specify yourlocation inthelocation property in thejobReference section of thejob resource.
Thesource URIs property must be fully-qualified, in the formatgs://bucket/object.Each URI can contain one '*'wildcard character.
Specify the CSV data format by setting thesourceFormat property toCSV.
To check the job status, calljobs.get(job_id*),wherejob_id is the ID of the job returned by the initialrequest.
- Ifstatus.state = DONE, the job completed successfully.
- If thestatus.errorResult property is present, the request failed,and that object will include information describing what went wrong.When a request fails, no table is created and no data is loaded.
- Ifstatus.errorResult is absent, the job finished successfully,although there might have been some nonfatal errors, such as problemsimporting a few rows. Nonfatal errors are listed in the returned jobobject'sstatus.errors property.

API notes:

Load jobs are atomic and consistent; if a load job fails, none of the datais available, and if a load job succeeds, all of the data is available.
As a best practice, generate a unique ID and pass it asjobReference.jobId when callingjobs.insert to create a load job. Thisapproach is more robust to network failure because the client can poll orretry on the known job ID.
Callingjobs.insert on a given job ID is idempotent. You can retry asmany times as you like on the same job ID, and at most one of thoseoperations will succeed.

C#

Before trying this sample, follow theC# setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryC# API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

usingGoogle.Cloud.BigQuery.V2;usingSystem;publicclassBigQueryLoadTableGcsCsv{publicvoidLoadTableGcsCsv(stringprojectId="your-project-id",stringdatasetId="your_dataset_id"){BigQueryClientclient=BigQueryClient.Create(projectId);vargcsURI="gs://cloud-samples-data/bigquery/us-states/us-states.csv";vardataset=client.GetDataset(datasetId);varschema=newTableSchemaBuilder{{"name",BigQueryDbType.String},{"post_abbr",BigQueryDbType.String}}.Build();vardestinationTableRef=dataset.GetTableReference(tableId:"us_states");// Create job configurationvarjobOptions=newCreateLoadJobOptions(){// The source format defaults to CSV; line below is optional.SourceFormat=FileFormat.Csv,SkipLeadingRows=1};// Create and run jobvarloadJob=client.CreateLoadJob(sourceUri:gcsURI,destination:destinationTableRef,schema:schema,options:jobOptions);loadJob=loadJob.PollUntilCompleted().ThrowOnAnyError();// Waits for the job to complete.// Display the number of rows uploadedBigQueryTabletable=client.GetTable(destinationTableRef);Console.WriteLine($"Loaded {table.Resource.NumRows} rows to {table.FullyQualifiedId}");}}

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

import("context""fmt""cloud.google.com/go/bigquery")// importCSVExplicitSchema demonstrates loading CSV data from Cloud Storage into a BigQuery// table and providing an explicit schema for the data.funcimportCSVExplicitSchema(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.csv")gcsRef.SkipLeadingRows=1gcsRef.Schema=bigquery.Schema{{Name:"name",Type:bigquery.StringFieldType},{Name:"post_abbr",Type:bigquery.StringFieldType},}loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.CsvOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;// Sample to load CSV data from Cloud Storage into a new BigQuery tablepublicclassLoadCsvFromGcs{publicstaticvoidrunLoadCsvFromGcs()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING));loadCsvFromGcs(datasetName,tableName,sourceUri,schema);}publicstaticvoidloadCsvFromGcs(StringdatasetName,StringtableName,StringsourceUri,Schemaschema){try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();// Skip header row in the file.CsvOptionscsvOptions=CsvOptions.newBuilder().setSkipLeadingRows(1).build();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationloadConfig=LoadJobConfiguration.newBuilder(tableId,sourceUri,csvOptions).setSchema(schema).build();// Load data from a GCS CSV file into the tableJobjob=bigquery.create(JobInfo.of(loadConfig));// Blocks until this load table job completes its execution, either failing or succeeding.job=job.waitFor();if(job.isDone()){System.out.println("CSV from GCS successfully added during load append job");}else{System.out.println("BigQuery was unable to load into the table due to an error:"+job.getStatus().getError());}}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/** * This sample loads the CSV file at * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv * * TODO(developer): Replace the following lines with the path to your file. */constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.csv';asyncfunctionloadCSVFromGCS(){// Imports a GCS file into a table with manually defined schema./**   * TODO(developer): Uncomment the following lines before running the sample.   */// const datasetId = 'my_dataset';// const tableId = 'my_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},],},location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id} completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors &&errors.length >0){throwerrors;}}

PHP

Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId  = 'The Google project ID';// $datasetId  = 'The BigQuery dataset ID';// instantiate the bigquery table service$bigQuery = new BigQueryClient([    'projectId' => $projectId,]);$dataset = $bigQuery->dataset($datasetId);$table = $dataset->table('us_states');// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv';$schema = [    'fields' => [        ['name' => 'name', 'type' => 'string'],        ['name' => 'post_abbr', 'type' => 'string']    ]];$loadConfig = $table->loadFromStorage($gcsUri)->schema($schema)->skipLeadingRows(1);$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {    print('Waiting for job to complete' . PHP_EOL);    $job->reload();    if (!$job->isComplete()) {        throw new Exception('Job has not yet completed', 500);    }});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {    $error = $job->info()['status']['errorResult']['message'];    printf('Error running job: %s' . PHP_EOL, $error);} else {    print('Data imported successfully' . PHP_EOL);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

Use theClient.load_table_from_uri()method to load data from a CSV file in Cloud Storage. Supply an explicitschema definition by setting theLoadJobConfig.schemaproperty to a list ofSchemaFieldobjects.

fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),],skip_leading_rows=1,# The source format defaults to CSV, so the line below is optional.source_format=bigquery.SourceFormat.CSV,)uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)# Make an API request.print("Loaded{} rows.".format(destination_table.num_rows))

Ruby

Before trying this sample, follow theRuby setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryRuby API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

require"google/cloud/bigquery"defload_table_gcs_csvdataset_id="your_dataset_id"bigquery=Google::Cloud::Bigquery.newdataset=bigquery.datasetdataset_idgcs_uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"table_id="us_states"load_job=dataset.load_jobtable_id,gcs_uri,skip_leading:1do|schema|schema.string"name"schema.string"post_abbr"endputs"Starting job#{load_job.job_id}"load_job.wait_until_done!# Waits for table load to complete.puts"Job finished."table=dataset.tabletable_idputs"Loaded#{table.rows_count} rows to table#{table.id}"end

Loading CSV data into a table that uses column-based time partitioning

To load CSV data from Cloud Storage into a BigQuery tablethat uses column-based time partitioning:

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

import("context""fmt""time""cloud.google.com/go/bigquery")// importPartitionedTable demonstrates specifing time partitioning for a BigQuery table when loading// CSV data from Cloud Storage.funcimportPartitionedTable(projectID,destDatasetID,destTableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv")gcsRef.SkipLeadingRows=1gcsRef.Schema=bigquery.Schema{{Name:"name",Type:bigquery.StringFieldType},{Name:"post_abbr",Type:bigquery.StringFieldType},{Name:"date",Type:bigquery.DateFieldType},}loader:=client.Dataset(destDatasetID).Table(destTableID).LoaderFrom(gcsRef)loader.TimePartitioning=&bigquery.TimePartitioning{Field:"date",Expiration:90*24*time.Hour,}loader.WriteDisposition=bigquery.WriteEmptyjob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.Field;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobId;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.Schema;importcom.google.cloud.bigquery.StandardSQLTypeName;importcom.google.cloud.bigquery.TableId;importcom.google.cloud.bigquery.TimePartitioning;importjava.time.Duration;importjava.time.temporal.ChronoUnit;importjava.util.UUID;publicclassLoadPartitionedTable{publicstaticvoidrunLoadPartitionedTable()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="/path/to/file.csv";loadPartitionedTable(datasetName,tableName,sourceUri);}publicstaticvoidloadPartitionedTable(StringdatasetName,StringtableName,StringsourceUri)throwsException{try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);Schemaschema=Schema.of(Field.of("name",StandardSQLTypeName.STRING),Field.of("post_abbr",StandardSQLTypeName.STRING),Field.of("date",StandardSQLTypeName.DATE));// Configure time partitioning. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/tables#TimePartitioningTimePartitioningpartitioning=TimePartitioning.newBuilder(TimePartitioning.Type.DAY).setField("date").setExpirationMs(Duration.of(90,ChronoUnit.DAYS).toMillis()).build();LoadJobConfigurationloadJobConfig=LoadJobConfiguration.builder(tableId,sourceUri).setFormatOptions(FormatOptions.csv()).setSchema(schema).setTimePartitioning(partitioning).build();// Create a job ID so that we can safely retry.JobIdjobId=JobId.of(UUID.randomUUID().toString());JobloadJob=bigquery.create(JobInfo.newBuilder(loadJobConfig).setJobId(jobId).build());// Load data from a GCS parquet file into the table// Blocks until this load table job completes its execution, either failing or succeeding.JobcompletedJob=loadJob.waitFor();// Check for errorsif(completedJob==null){thrownewException("Job not executed since it no longer exists.");}elseif(completedJob.getStatus().getError()!=null){// You can also look at queryJob.getStatus().getExecutionErrors() for all// errors, not just the latest one.thrownewException("BigQuery was unable to load into the table due to an error: \n"+loadJob.getStatus().getError());}System.out.println("Data successfully loaded into time partitioned table during load job");}catch(BigQueryException|InterruptedExceptione){System.out.println("Data not loaded into time partitioned table during load job \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/** * This sample loads the CSV file at * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv * * TODO(developer): Replace the following lines with the path to your file. */constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states-by-date.csv';asyncfunctionloadTablePartitioned(){// Load data into a table that uses column-based time partitioning./**   * TODO(developer): Uncomment the following lines before running the sample.   */// const datasetId = 'my_dataset';// const tableId = 'my_new_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstpartitionConfig={type:'DAY',expirationMs:'7776000000',// 90 daysfield:'date',};constmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},{name:'date',type:'DATE'},],},location:'US',timePartitioning:partitionConfig,};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id} completed.`);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

fromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_name"job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),bigquery.SchemaField("date","DATE"),],skip_leading_rows=1,time_partitioning=bigquery.TimePartitioning(type_=bigquery.TimePartitioningType.DAY,field="date",# Name of the column to use for partitioning.expiration_ms=7776000000,# 90 days.),)uri="gs://cloud-samples-data/bigquery/us-states/us-states-by-date.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Wait for the job to complete.table=client.get_table(table_id)print("Loaded{} rows to table{}".format(table.num_rows,table_id))

Appending to or overwriting a table with CSV data

You can load additional data into a table either from source files or byappending query results.

In the Google Cloud console, use theWrite preference option to specifywhat action to take when you load data from a source file or from a queryresult.

You have the following options when you load additional data into a table:

Console option	bq tool flag	BigQuery API property	Description
Write if empty	Not supported	`WRITE_EMPTY`	Writes the data only if the table is empty.
Append to table	`--noreplace` or`--replace=false`; if`--[no]replace` is unspecified, the default is append	`WRITE_APPEND`	(Default) Appends the data to the end of the table.
Overwrite table	`--replace` or`--replace=true`	`WRITE_TRUNCATE`	Erases all existing data in a table before writing the new data. This action also deletes the table schema, row level security, and removes any Cloud KMS key.

If you load data into an existing table, the load job can append the data oroverwrite the table.

Note: This page does not cover appending or overwriting partitioned tables. Forinformation on appending and overwriting partitioned tables, see:Appending to and overwriting partitioned table data.

Console

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery
In the left pane, clickExplorer.
In theExplorer pane, expand your project, clickDatasets, and then select a dataset.
In theDataset info section, clickCreate table.
In theCreate table pane, specify the following details:

In theSource section, selectGoogle Cloud Storage in theCreate table from list. Then, do the following:
1. Select a file from the Cloud Storage bucket, or enter theCloud Storage URI. You cannot include multiple URIs in the Google Cloud console, butwildcards are supported. The Cloud Storage bucket must be in the same location as the dataset that contains the table you want to create, append, or overwrite.
2. ForFile format, selectCSV.

Note: It is possible to modify the table's schema when you append or overwrite it. For more information about supported schema changes during a load operation, see

Modifying table schemas

In theDestination section, specify the following details:
1. ForDataset, select the dataset in which you want to create the table.
2. In theTable field, enter the name of the table that you want to create.
3. Verify that theTable type field is set toNative table.
In theSchema section, enter theschema definition. To enable theauto detection of a schema, selectAuto detect. You can enter schema information manually by using one of the following methods:
- Option 1: ClickEdit as text and paste the schema in the form of a JSON array. When you use a JSON array, you generate the schema using the same process ascreating a JSON schema file. You can view the schema of an existing table in JSON format by entering the following command:
```
bqshow--format=prettyjsondataset.table
```
- Option 2: ClickAdd field and enter the table schema. Specify each field'sName,Type, andMode.
Optional: SpecifyPartition and cluster settings. For more information, seeCreating partitioned tables andCreating and using clustered tables. You cannot convert a table to a partitioned or clustered table by appending or overwriting it. The Google Cloud console does not support appending to or overwriting partitioned or clustered tables in a load job.
ClickAdvanced options and do the following:
- ForWrite preference, chooseAppend to table orOverwrite table.
- ForNumber of errors allowed, accept the default value of0 or enter the maximum number of rows containing errors that can be ignored. If the number of rows with errors exceeds this value, the job will result in aninvalid message and fail. This option applies only to CSV and JSON files.
- ForTime zone, enter the default time zone that will apply when parsing timestamp values that have no specific time zone. Checkhere for more valid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC. (Preview).
- ForDate Format, enter the format elements that define how the DATE values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats. (Preview).
- ForDatetime Format, enter the format elements that define how the DATETIME values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats. (Preview).
- ForTime Format, enter the format elements that define how the TIME values are formatted in the input files. This field expects SQL styles format (for example,HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats. (Preview).
- ForTimestamp Format, enter the format elements that define how the TIMESTAMP values are formatted in the input files. This field expects SQL styles format (for example,MM/DD/YYYY HH24:MI:SS.FF3). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats. (Preview).
- If you want to ignore values in a row that are not present in the table's schema, then selectUnknown values.
- ForField delimiter, choose the character that separates the cells in your CSV file:Comma,Tab,Pipe, orCustom. If you chooseCustom, enter the delimiter in theCustom field delimiter box. The default value isComma.
- ForSource column match, choose one of the following strategies used to match the loaded columns to the schema (Preview).
- ForHeader rows to skip, enter the number of header rows to skip at the top of the CSV file. The default value is0.
- ForQuoted newlines, checkAllow quoted newlines to allow quoted data sections that contain newline characters in a CSV file. The default value isfalse.
- ForJagged rows, checkAllow jagged rows to accept rows in CSV files that are missing trailing optional columns. The missing values are treated as nulls. If unchecked, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value isfalse.
- ForNull markers, enter a list of custom strings that represents a NULL value in CSV data. (Preview).
- ForEncryption, clickCustomer-managed key to use aCloud Key Management Service key. If you leave theGoogle-managed key setting, BigQueryencrypts the data at rest.
ClickCreate table.

SQL

Use theLOAD DATA DDL statement.The following example appends a CSV file to the tablemytable:

In the Google Cloud console, go to theBigQuery page.
Go to BigQuery

In the query editor, enter the following statement:

LOADDATAINTOmydataset.mytableFROMFILES(format='CSV',uris=['gs://bucket/path/file.csv']);

ClickRun.

For more information about how to run queries, seeRun an interactive query.

bq

Use thebq load command, specifyCSV using the--source_format flag, and include aCloud Storage URI.You can include a single URI, a comma-separated list of URIs, or a URIcontaining awildcard.

Supply the schema inline, in a schema definition file, or useschema auto-detect. If you don't specify aschema, and--autodetect isfalse, and the destinationtable exists, then the schema of the destination table is used.

Specify the--replace flag to overwrite thetable. Use the--noreplace flag to append data to the table. If no flag isspecified, the default is to append data.

It is possible to modify the table's schema when you append oroverwrite it. For more information on supported schema changes during a loadoperation, seeModifying table schemas.

(Optional) Supply the--location flag and set the value to yourlocation.

Other optional flags include:

--allow_jagged_rows: When specified, accept rows in CSV files that aremissing trailing optional columns. The missing values are treated as nulls.If unchecked, records with missing trailing columns are treated as badrecords, and if there are too many bad records, an invalid error is returnedin the job result. The default value isfalse.
--allow_quoted_newlines: When specified, allows quoted data sectionsthat contain newline characters in a CSV file. The default value isfalse.
--field_delimiter: The character that indicates the boundary betweencolumns in the data. Both\t andtab are allowed for tab delimiters.The default value is,.
--null_marker: An optional custom string that represents a NULL value inCSV data.
--null_markers: (Preview) An optionalcomma-separated list of custom strings that represent NULL values in CSVdata. This option cannot be used with--null_marker flag.
--source_column_match: (Preview)Specifies the strategy used to match loaded columns to the schema. You canspecifyPOSITION to match loaded columns by position, assuming that thecolumns are ordered the same way as the schema. You can also specifyNAMEto match by name by reading the header row as the column names andreordering columns to match the field names in the schema. If this value isunspecified, then the default is based on how the schema is provided. If--autodetect is enabled, then the default behavior is to match columns byname. Otherwise, the default is to match columns by position.
--skip_leading_rows: Specifies the number of header rows to skipat the top of the CSV file. The default value is0.
--quote: The quote character to use to enclose records. The defaultvalue is". To indicate no quote character, use an empty string.
--max_bad_records: An integer that specifies the maximum number of badrecords allowed before the entire job fails. The default value is0. Atmost, five errors of any type are returned regardless of the--max_bad_records value.
--ignore_unknown_values: When specified, allows and ignores extra,unrecognized values in CSV or JSON data.
--time_zone: (Preview) An optionaldefault time zone that will apply when parsing timestamp values that have nospecific time zone in CSV or JSON data.
--date_format: (Preview) An optionalcustom string that defines how the DATE values are formatted in CSV or JSONdata.
--datetime_format: (Preview) Anoptional custom string that defines how the DATETIME values are formatted inCSV or JSON data.
--time_format: (Preview) An optionalcustom string that defines how the TIME values are formatted in CSV or JSONdata.
--timestamp_format: (Preview) Anoptional custom string that defines how the TIMESTAMP values are formattedin CSV or JSON data.
--autodetect: When specified, enable schema auto-detection for CSV andJSON data.
--destination_kms_key: The Cloud KMS key for encryption of thetable data.

bq--location=locationload\--[no]replace\--source_format=format\dataset.table\path_to_source\schema

where:

location is yourlocation.The--location flag is optional. You can set a default value for thelocation using the.bigqueryrc file.
format isCSV.
dataset is an existing dataset.
table is the name of the table into which you're loading data.
path_to_source is a fully-qualifiedCloud Storage URIor a comma-separated list of URIs.Wildcardsare also supported.
schema is a valid schema. The schema can be a local JSON file,or it can be typed inline as part of the command. You can also use the--autodetect flag instead of supplying a schema definition.

Examples:

The following command loads data fromgs://mybucket/mydata.csv andoverwrites a table namedmytable inmydataset. The schema is definedusingschema auto-detection.

bqload\--autodetect\--replace\--source_format=CSV\mydataset.mytable\gs://mybucket/mydata.csv

The following command loads data fromgs://mybucket/mydata.csv andappends data to a table namedmytable inmydataset. The schema isdefined using a JSON schema file —myschema.json.

bqload\--noreplace\--source_format=CSV\mydataset.mytable\gs://mybucket/mydata.csv\./myschema.json

API

Create aload job that points to the source data in Cloud Storage.
(Optional) Specify yourlocation inthelocation property in thejobReference section of thejob resource.
Thesource URIs propertymust be fully-qualified, in the formatgs://bucket/object. You caninclude multiple URIs as a comma-separated list. Note thatwildcards arealso supported.
Specify the data format by setting theconfiguration.load.sourceFormat property toCSV.
Specify the write preference by setting theconfiguration.load.writeDisposition property toWRITE_TRUNCATE orWRITE_APPEND.

Go

Before trying this sample, follow theGo setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryGo API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

import("context""fmt""cloud.google.com/go/bigquery")// importCSVTruncate demonstrates loading data from CSV data in Cloud Storage and overwriting/truncating// data in the existing table.funcimportCSVTruncate(projectID,datasetID,tableIDstring)error{// projectID := "my-project-id"// datasetID := "mydataset"// tableID := "mytable"ctx:=context.Background()client,err:=bigquery.NewClient(ctx,projectID)iferr!=nil{returnfmt.Errorf("bigquery.NewClient: %v",err)}deferclient.Close()gcsRef:=bigquery.NewGCSReference("gs://cloud-samples-data/bigquery/us-states/us-states.csv")gcsRef.SourceFormat=bigquery.CSVgcsRef.AutoDetect=truegcsRef.SkipLeadingRows=1loader:=client.Dataset(datasetID).Table(tableID).LoaderFrom(gcsRef)loader.WriteDisposition=bigquery.WriteTruncatejob,err:=loader.Run(ctx)iferr!=nil{returnerr}status,err:=job.Wait(ctx)iferr!=nil{returnerr}ifstatus.Err()!=nil{returnfmt.Errorf("job completed with error: %v",status.Err())}returnnil}

Java

Before trying this sample, follow theJava setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryJava API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

importcom.google.cloud.bigquery.BigQuery;importcom.google.cloud.bigquery.BigQueryException;importcom.google.cloud.bigquery.BigQueryOptions;importcom.google.cloud.bigquery.FormatOptions;importcom.google.cloud.bigquery.Job;importcom.google.cloud.bigquery.JobInfo;importcom.google.cloud.bigquery.JobInfo.WriteDisposition;importcom.google.cloud.bigquery.LoadJobConfiguration;importcom.google.cloud.bigquery.TableId;// Sample to overwrite the BigQuery table data by loading a CSV file from GCSpublicclassLoadCsvFromGcsTruncate{publicstaticvoidrunLoadCsvFromGcsTruncate()throwsException{// TODO(developer): Replace these variables before running the sample.StringdatasetName="MY_DATASET_NAME";StringtableName="MY_TABLE_NAME";StringsourceUri="gs://cloud-samples-data/bigquery/us-states/us-states.csv";loadCsvFromGcsTruncate(datasetName,tableName,sourceUri);}publicstaticvoidloadCsvFromGcsTruncate(StringdatasetName,StringtableName,StringsourceUri)throwsException{try{// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests.BigQuerybigquery=BigQueryOptions.getDefaultInstance().getService();TableIdtableId=TableId.of(datasetName,tableName);LoadJobConfigurationconfiguration=LoadJobConfiguration.builder(tableId,sourceUri).setFormatOptions(FormatOptions.csv())// Set the write disposition to overwrite existing table data.setWriteDisposition(WriteDisposition.WRITE_TRUNCATE).build();// For more information on Job see:// https://googleapis.dev/java/google-cloud-clients/latest/index.html?com/google/cloud/bigquery/package-summary.html// Load the tableJobloadJob=bigquery.create(JobInfo.of(configuration));// Load data from a GCS parquet file into the table// Blocks until this load table job completes its execution, either failing or succeeding.JobcompletedJob=loadJob.waitFor();// Check for errorsif(completedJob==null){thrownewException("Job not executed since it no longer exists.");}elseif(completedJob.getStatus().getError()!=null){// You can also look at queryJob.getStatus().getExecutionErrors() for all// errors, not just the latest one.thrownewException("BigQuery was unable to load into the table due to an error: \n"+loadJob.getStatus().getError());}System.out.println("Table is successfully overwritten by CSV file loaded from GCS");}catch(BigQueryException|InterruptedExceptione){System.out.println("Column not added during load append \n"+e.toString());}}}

Node.js

Before trying this sample, follow theNode.js setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryNode.js API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

To replace the rows in an existing table, set thewriteDispositionvalue in themetadata parameter to'WRITE_TRUNCATE'.

// Import the Google Cloud client librariesconst{BigQuery}=require('@google-cloud/bigquery');const{Storage}=require('@google-cloud/storage');// Instantiate clientsconstbigquery=newBigQuery();conststorage=newStorage();/** * This sample loads the CSV file at * https://storage.googleapis.com/cloud-samples-data/bigquery/us-states/us-states.csv * * TODO(developer): Replace the following lines with the path to your file. */constbucketName='cloud-samples-data';constfilename='bigquery/us-states/us-states.csv';asyncfunctionloadCSVFromGCSTruncate(){/**   * Imports a GCS file into a table and overwrites   * table data if table already exists.   *//**   * TODO(developer): Uncomment the following lines before running the sample.   */// const datasetId = 'my_dataset';// const tableId = 'my_table';// Configure the load job. For full list of options, see:// https://cloud.google.com/bigquery/docs/reference/rest/v2/Job#JobConfigurationLoadconstmetadata={sourceFormat:'CSV',skipLeadingRows:1,schema:{fields:[{name:'name',type:'STRING'},{name:'post_abbr',type:'STRING'},],},// Set the write disposition to overwrite existing table data.writeDisposition:'WRITE_TRUNCATE',location:'US',};// Load data from a Google Cloud Storage file into the tableconst[job]=awaitbigquery.dataset(datasetId).table(tableId).load(storage.bucket(bucketName).file(filename),metadata);// load() waits for the job to finishconsole.log(`Job${job.id} completed.`);// Check the job's status for errorsconsterrors=job.status.errors;if(errors &&errors.length >0){throwerrors;}}

Before trying this sample, follow thePHP setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPHP API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

use Google\Cloud\BigQuery\BigQueryClient;use Google\Cloud\Core\ExponentialBackoff;/** Uncomment and populate these variables in your code */// $projectId = 'The Google project ID';// $datasetId = 'The BigQuery dataset ID';// $tableId = 'The BigQuery table ID';// instantiate the bigquery table service$bigQuery = new BigQueryClient([    'projectId' => $projectId,]);$table = $bigQuery->dataset($datasetId)->table($tableId);// create the import job$gcsUri = 'gs://cloud-samples-data/bigquery/us-states/us-states.csv';$loadConfig = $table->loadFromStorage($gcsUri)->skipLeadingRows(1)->writeDisposition('WRITE_TRUNCATE');$job = $table->runJob($loadConfig);// poll the job until it is complete$backoff = new ExponentialBackoff(10);$backoff->execute(function () use ($job) {    print('Waiting for job to complete' . PHP_EOL);    $job->reload();    if (!$job->isComplete()) {        throw new Exception('Job has not yet completed', 500);    }});// check if the job has errorsif (isset($job->info()['status']['errorResult'])) {    $error = $job->info()['status']['errorResult']['message'];    printf('Error running job: %s' . PHP_EOL, $error);} else {    print('Data imported successfully' . PHP_EOL);}

Python

Before trying this sample, follow thePython setup instructions in theBigQuery quickstart using client libraries. For more information, see theBigQueryPython API reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up authentication for client libraries.

To replace the rows in an existing table, set theLoadJobConfig.write_dispositionproperty to theSourceFormatconstantWRITE_TRUNCATE.

importsixfromgoogle.cloudimportbigquery# Construct a BigQuery client object.client=bigquery.Client()# TODO(developer): Set table_id to the ID of the table to create.# table_id = "your-project.your_dataset.your_table_namejob_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("name","STRING"),bigquery.SchemaField("post_abbr","STRING"),],)body=six.BytesIO(b"Washington,WA")client.load_table_from_file(body,table_id,job_config=job_config).result()previous_rows=client.get_table(table_id).num_rowsassertprevious_rows >0job_config=bigquery.LoadJobConfig(write_disposition=bigquery.WriteDisposition.WRITE_TRUNCATE,source_format=bigquery.SourceFormat.CSV,skip_leading_rows=1,)uri="gs://cloud-samples-data/bigquery/us-states/us-states.csv"load_job=client.load_table_from_uri(uri,table_id,job_config=job_config)# Make an API request.load_job.result()# Waits for the job to complete.destination_table=client.get_table(table_id)print("Loaded{} rows.".format(destination_table.num_rows))

Loading hive-partitioned CSV data

BigQuery supports loading hive-partitioned CSV data stored onCloud Storage and will populate the hive partitioning columns as columns inthe destination BigQuery managed table. For more information, seeLoading Externally Partitioned Data from Cloud Storage.

Details of loading CSV data

This section describes how BigQuery handles various CSV formattingoptions.

Encoding

BigQuery expects CSV data to be UTF-8 encoded. If you haveCSV files with other supported encoding types, you should explicitly specify theencoding so that BigQuery can properly convert the data to UTF-8.

BigQuery supports the following encoding types for CSV files:

UTF-8
ISO-8859-1
UTF-16BE (UTF-16 Big Endian)
UTF-16LE (UTF-16 Little Endian)
UTF-32BE (UTF-32 Big Endian)
UTF-32LE (UTF-32 Little Endian)

If you don't specify an encoding, or if you specify UTF-8 encoding when the CSVfile is not UTF-8 encoded, BigQuery attempts to convert the datato UTF-8. Generally, if the CSV file is ISO-8859-1 encoded, your data will beloaded successfully, but it may not exactly match what you expect. If the CSVfile is UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE encoded, the load might fail.To avoid unexpected failures, specify the correct encoding by using the--encoding flag.

Note: If the CSV file is UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE encoded, andthe --allow_quoted_newlines flag is set astrue, then the CSV file has a maximum size limit of 1GB.Note: By default, if the CSV file contains the ASCII0 (NULL) character, youcan't load the data into BigQuery. If you want to allow ASCII0and other ASCII control characters, then set--preserve_ascii_control_characters=true to your load jobs.

If BigQuery can't convert a character other than the ASCII0character, BigQuery converts the character to the standardUnicode replacement character: �.

Field delimiters

Delimiters in CSV files can be any single-byte character. If the source fileuses ISO-8859-1 encoding, any character can be a delimiter. If the source fileuses UTF-8 encoding, any character in the decimal range 1-127 (U+0001-U+007F)can be used without modification. You can insert an ISO-8859-1 character outsideof this range as a delimiter, and BigQuery will interpret itcorrectly. However, if you use a multibyte character as a delimiter, some of thebytes will be interpreted incorrectly as part of the field value.

Generally, it's a best practice to use a standard delimiter, such as a tab,pipe, or comma. The default is a comma.

Data types

Boolean. BigQuery can parse any of the following pairs forBoolean data: 1 or 0, true or false, t or f, yes or no, or y or n (all caseinsensitive). Schema autodetectionautomatically detects any of these except 0 and 1.

Bytes. Columns with BYTES types must be encoded as Base64.

Date. Columns with DATE types must be in the formatYYYY-MM-DD.

Datetime. Columns with DATETIME types must be in the formatYYYY-MM-DDHH:MM:SS[.SSSSSS].

Geography. Columns with GEOGRAPHY types must contain strings in one of thefollowing formats:

Well-known text (WKT)
Well-known binary (WKB)
GeoJSON

If you use WKB, the value should be hex encoded.

The following list shows examples of valid data:

WKT:POINT(1 2)
GeoJSON:{ "type": "Point", "coordinates": [1, 2] }
Hex encoded WKB:0101000000feffffffffffef3f0000000000000040

Before loading GEOGRAPHY data, also readLoading geospatial data.

Interval. Columns withINTERVAL types must be in the formatY-M D H:M:S[.F], where:

Y = Year. Supported range is 0-10,000.
M = Month. Supported range is 1-12.
D = Day. Supported range is 1-[last day of the indicated month].
H = Hour.
M = Minute.
S = Second.
[.F] = Fractions of a second up to six digits, with microsecond precision.

You can indicate a negative value by prepending a dash (-).

The following list shows examples of valid data:

10-6 0 0:0:0
0-0 -5 0:0:0
0-0 0 0:0:1.25

To load INTERVAL data, you must use thebq load command and usethe--schema flag to specify a schema. You can't upload INTERVAL data by usingthe console.

JSON. Quotes are escaped by using the two character sequence"". For moreinformation, see an example ofloading JSON data from a CSV file

Time. Columns with TIME types must be in the formatHH:MM:SS[.SSSSSS].

Timestamp. BigQuery accepts various timestamp formats.The timestamp must include a date portion and a time portion.

The date portion can be formatted asYYYY-MM-DD orYYYY/MM/DD.
The timestamp portion must be formatted asHH:MM[:SS[.SSSSSS]] (seconds andfractions of seconds are optional).
The date and time must be separated by a space or 'T'.
Optionally, the date and time can be followed by a UTC offset or the UTC zonedesignator (Z). For more information, seeTime zones.

For example, any of the following are valid timestamp values:

2018-08-19 12:11
2018-08-19 12:11:35
2018-08-19 12:11:35.22
2018/08/19 12:11
2018-07-05 12:54:00 UTC
2018-08-19 07:11:35.220 -05:00
2018-08-19T12:11:35.220Z

If you provide a schema, BigQuery also accepts Unix epoch time fortimestamp values. However, schema autodetection doesn't detect this case, andtreats the value as a numeric or string type instead.

Examples of Unix epoch timestamp values:

1534680695
1.534680695e12

RANGE. Represented in CSV files in the format[LOWER_BOUND,UPPER_BOUND),whereLOWER_BOUND andUPPER_BOUNDare validDATE,DATETIME, orTIMESTAMP strings.NULL andUNBOUNDEDrepresent unbounded start or end values.

Note: Since the range CSV format contains a comma, if the CSV delimiter is acomma, it must be surrounded by double quotes for the CSV file to be valid.

The following are example of CSV values forRANGE<DATE>:

"[2020-01-01, 2021-01-01)"
"[UNBOUNDED, 2021-01-01)"
"[2020-03-01, NULL)"
"[UNBOUNDED, UNBOUNDED)"

Schema auto-detection

This section describes the behavior of schema auto-detection when loading CSV files.

CSV delimiter

BigQuery detects the following delimiters:

comma ( , )
pipe ( | )
tab ( \t )

CSV header

BigQuery infers headers by comparing the first row of the filewith other rows in the file. If the first line contains only strings, and theother lines contain other data types, BigQuery assumes that thefirst row is a header row. BigQuery assigns column names based on the field names in the header row. The names might be modified to meet thenaming rules for columns in BigQuery. For example, spaces will be replaced with underscores.

Otherwise, BigQuery assumes the first row is a data row, andassigns generic column names such asstring_field_1. Note that after a tableis created, the column names cannot be updated in the schema, although you canchange the namesmanuallyafter the table is created. Another option is to provide an explicit schemainstead of using autodetect.

You might have a CSV file with a header row, where all of the data fields arestrings. In that case, BigQuery will not automatically detect thatthe first row is a header. Use the--skip_leading_rows option to skip theheader row. Otherwise, the header will be imported as data. Also considerproviding an explicit schema in this case, so that you can assign column names.

CSV quoted new lines

BigQuery detects quoted new line characters within a CSV fieldand does not interpret the quoted new line character as a row boundary.

Troubleshoot parsing errors

If there's a problem parsing your CSV files, then theload job'serrors resource ispopulated with the error details.

Generally, these errors identify the start of the problematic line with a byteoffset. For uncompressed files you can usegcloud storage with the--recursive argument to access the relevant line.

For example, you run thebq load commandand receive an error:

bq load    --skip_leading_rows=1 \    --source_format=CSV \    mydataset.mytable \    gs://my-bucket/mytable.csv \    'Number:INTEGER,Name:STRING,TookOffice:STRING,LeftOffice:STRING,Party:STRING'

The error in the output is similar to the following:

Waiting on bqjob_r5268069f5f49c9bf_0000018632e903d7_1 ... (0s)Current status: DONEBigQuery error in load operation: Error processing job'myproject:bqjob_r5268069f5f49c9bf_0000018632e903d7_1': Error while readingdata, error message: Error detected while parsing row starting at position: 1405.Error: Data between close quote character (") and field separator.File: gs://my-bucket/mytable.csvFailure details:-gs://my-bucket/mytable.csv: Error while reading data,error message: Error detected while parsing row starting atposition:1405. Error: Data between close quote character (") andfield separator. File: gs://my-bucket/mytable.csv- Error while reading data, error message: CSV processing encounteredtoo many errors, giving up. Rows: 22; errors: 1; max bad: 0; errorpercent: 0

Based on the preceding error, there's a format error in the file.To view the file's content, run thegcloud storage cat command:

gcloud storage cat1405-1505 gs://my-bucket/mytable.csv --recursive

The output is similar to the following:

16,Abraham Lincoln,"March 4, 1861","April 15, "1865,Republican18,Ulysses S. Grant,"March 4, 1869",...

Based on the output of the file, the problem is a misplaced quote in"April 15, "1865.

Compressed CSV files

Debugging parsing errors is more challenging for compressed CSV files, sincethe reported byte offset refers to the location in theuncompressed file.The followinggcloud storage cat commandstreams the file from Cloud Storage, decompresses the file, identifiesthe appropriate byte offset, and prints the line with the format error:

gcloud storage cat gs://my-bucket/mytable.csv.gz | gunzip - | tail -c+1406 | head -n 1

The output is similar to the following:

16,Abraham Lincoln,"March 4, 1861","April 15, "1865,Republican

Troubleshoot quota errors

Use the information in this section to troubleshoot quota or limit errorsrelated to loading CSV files into BigQuery.

Loading CSV files quota errors

If you load a large CSV file using thebq load command with the--allow_quoted_newlines flag,you might encounter this error.

Error message

Input CSV files are not splittable and at least one of the files is larger thanthe maximum allowed size. Size is: ...

Resolution

To resolve this quota error, do the following:

Set the--allow_quoted_newlines flag tofalse.
Split the CSV file into smaller chunks that are each less than 4 GB.

For more information about limits that apply when you load data intoBigQuery, seeLoad jobs.

CSV options

To change how BigQuery parses CSV data, specify additional optionsin the Google Cloud console, the bq command-line tool, or the API.

For more information on the CSV format, seeRFC 4180.

CSV option	Console option	bq tool flag	BigQuery API property	Description
Field delimiter	Field delimiter: Comma, Tab, Pipe, Custom	`-F` or`--field_delimiter`	`fieldDelimiter` (Java,Python)	(Optional) The separator for fields in a CSV file. The separator can be any ISO-8859-1 single-byte character. BigQuery converts the string to ISO-8859-1 encoding, and uses the first byte of the encoded string to split the data in its raw, binary state. BigQuery also supports the escape sequence "\t" to specify a tab separator. The default value is a comma (`,`).
Header rows	Header rows to skip	`--skip_leading_rows`	`skipLeadingRows` (Java,Python)	(Optional) An integer indicating the number of header rows in the source data.
Source column match	Source column match: Default, Position, Name	`--source_column_match`	`sourceColumnMatch` (Java,Python)	(Optional) This controls the strategy used to match loaded columns to the schema. Supported values include: `POSITION`: matches by position. This option assumes that the columns are ordered the same way as the schema. `NAME`: matches by name. This option reads the header row as column names and reorders columns to match the field names in the schema. Column names are read from the last skipped row based on the`skipLeadingRows` property. If this value is unspecified, then the default is based on how the schema is provided. If autodetect is enabled, then the default behavior is to match columns by name. Otherwise, the default is to match columns by position. This is done to keep the behavior backward-compatible.
Number of bad records allowed	Number of errors allowed	`--max_bad_records`	`maxBadRecords` (Java,Python)	(Optional) The maximum number of bad records that BigQuery can ignore when running the job. If the number of bad records exceeds this value, an invalid error is returned in the job result. The default value is 0, which requires that all records are valid.
Newline characters	Allow quoted newlines	`--allow_quoted_newlines`	`allowQuotedNewlines` (Java,Python)	(Optional) Indicates whether to allow quoted data sections that contain newline characters in a CSV file. The default value is false.
Custom null values	None	`--null_marker`	`nullMarker` (Java,Python)	(Optional) Specifies a string that represents a null value in a CSV file. For example, if you specify "\N", BigQuery interprets "\N" as a null value when loading a CSV file. The default value is the empty string. If you set this property to a custom value, BigQuery throws an error if an empty string is present for all data types except for STRING and BYTE. For STRING and BYTE columns, BigQuery interprets the empty string as an empty value.
Trailing optional columns	Allow jagged rows	`--allow_jagged_rows`	`allowJaggedRows` (Java,Python)	(Optional) Accept rows that are missing trailing optional columns. The missing values are treated as nulls. If false, records with missing trailing columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. Only applicable to CSV, ignored for other formats.
Unknown values	Ignore unknown values	`--ignore_unknown_values`	`ignoreUnknownValues` (Java,Python)	(Optional) Indicates if BigQuery should allow extra values that are not represented in the table schema. If true, the extra values are ignored. If false, records with extra columns are treated as bad records, and if there are too many bad records, an invalid error is returned in the job result. The default value is false. The`sourceFormat` property determines what BigQuery treats as an extra value: CSV: Trailing columns JSON: Named values that don't match any column names
Quote	Quote character: Double quote, Single quote, None, Custom	`--quote`	`quote` (Java,Python)	(Optional) The value that is used to quote data sections in a CSV file. BigQuery converts the string to ISO-8859-1 encoding, and then uses the first byte of the encoded string to split the data in its raw, binary state. The default value is a double-quote ('"'). If your data does not contain quoted sections, set the property value to an empty string. If your data contains quoted newline characters, you must also set the`allowQuotedNewlines` property to`true`. To include the specific quote character within a quoted value, precede it with an additional matching quote character. For example, if you want to escape the default character ' " ', use ' "" '.
Encoding	None	`-E` or`--encoding`	`encoding` (Java,Python)	(Optional) The character encoding of the data. The supported values are UTF-8, ISO-8859-1, UTF-16BE, UTF-16LE, UTF-32BE, or UTF-32LE. The default value is UTF-8. BigQuery decodes the data after the raw, binary data has been split using the values of the`quote` and`fieldDelimiter` properties.
ASCII control character	None	`--preserve_ascii_control_characters`	None	(Optional) If you want to allow ASCII 0 and other ASCII control characters, then set`--preserve_ascii_control_characters` to`true` to your load jobs.
Null Markers	Null Markers	`--null_markers`	`nullMarkers` (Java,Python)	(Optional) A list of custom strings that represents a NULL value in CSV data. This option cannot be used with`--null_marker` option.
Time Zone	Time Zone	`--time_zone`	`timeZone` (Java,Python)	(Optional) Default time zone that will apply when parsing timestamp values that have no specific time zone. Checkvalid time zone names. If this value is not present, the timestamp values without specific time zone is parsed using default time zone UTC.
Date Format	Date Format	`--date_format`	`dateFormat` (Java,Python)	(Optional)Format elements that define how the DATE values are formatted in the input files (for example,`MM/DD/YYYY`). If this value is present, this format is the only compatible DATE format.Schema autodetection will also decide DATE column type based on this format instead of the existing format. If this value is not present, the DATE field is parsed with thedefault formats.
Datetime Format	Datetime Format	`--datetime_format`	`datetimeFormat` (Java,Python)	(Optional)Format elements that define how the DATETIME values are formatted in the input files (for example,`MM/DD/YYYY HH24:MI:SS.FF3`). If this value is present, this format is the only compatible DATETIME format.Schema autodetection will also decide DATETIME column type based on this format instead of the existing format. If this value is not present, the DATETIME field is parsed with thedefault formats.
Time Format	Time Format	`--time_format`	`timeFormat` (Java,Python)	(Optional)Format elements that define how the TIME values are formatted in the input files (for example,`HH24:MI:SS.FF3`). If this value is present, this format is the only compatible TIME format.Schema autodetection will also decide TIME column type based on this format instead of the existing format. If this value is not present, the TIME field is parsed with thedefault formats.
Timestamp Format	Timestamp Format	`--timestamp_format`	`timestampFormat` (Java,Python)	(Optional)Format elements that define how the TIMESTAMP values are formatted in the input files (for example,`MM/DD/YYYY HH24:MI:SS.FF3`). If this value is present, this format is the only compatible TIMESTAMP format.Schema autodetection will also decide TIMESTAMP column type based on this format instead of the existing format. If this value is not present, the TIMESTAMP field is parsed with thedefault formats.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換

Loading CSV data from Cloud Storage

Try it for yourself

Limitations

Before you begin

Required permissions

Permissions to load data into BigQuery

Permissions to load data from Cloud Storage

Required permissions

Create a dataset

CSV compression

Loading CSV data into a table

Console

SQL

bq

API

C#

Go

Java

Node.js

PHP

Python

Ruby

Loading CSV data into a table that uses column-based time partitioning

Go

Java

Node.js

Python

Appending to or overwriting a table with CSV data

Console

SQL

bq

API

Go

Java

Node.js

Python

Loading hive-partitioned CSV data

Details of loading CSV data

Encoding

Field delimiters

Data types

Schema auto-detection

CSV delimiter

CSV header

CSV quoted new lines

Troubleshoot parsing errors

Compressed CSV files

Troubleshoot quota errors

Loading CSV files quota errors

Resolution

CSV options