Use open source a Python libraries

You can choose from among three Python libraries in BigQuery,based on your use case.

	Use case	Maintained by	Description
BigQuery DataFrames	Python based data processing and ML operations with server-side processing (for example, using slots)	Google	Pandas and Scikit learn APIs implemented with server-side pushdown. For more information, seeIntroduction to BigQuery DataFrames.
pandas-gbq	Python based data processing using client side data copy	Open source library maintained by PyData and volunteer contributors	Lets you move data to and from Python DataFrames on the client side. For more information, see thedocumentation andsource code.
google-cloud-bigquery	BigQuery deployment, administration, and SQL-based querying	Open source library maintained by Google	Python package that wraps all the BigQuery APIs. For more information, see thedocumentation andsource code.

Using pandas-gbq and google-cloud-bigquery

Thepandas-gbq library provides a simple interface for running queries anduploading pandas dataframes to BigQuery. It is a thin wrapperaround theBigQuery client library,google-cloud-bigquery. Both of these libraries focus on helping you performdata analysis using SQL.

Install the libraries

To use the code samples in this guide, install thepandas-gbq package and theBigQuery Python client libraries.

Install thepandas-gbq andgoogle-cloud-bigquery packages.

pipinstall--upgradepandas-gbq'google-cloud-bigquery[bqstorage,pandas]'

Running Queries

Both libraries support querying data stored in BigQuery. Keydifferences between the libraries include:

	pandas-gbq	google-cloud-bigquery
Default SQL syntax	GoogleSQL (configurable with`pandas_gbq.context.dialect`)	GoogleSQL
Query configurations	Sent as dictionary in the format of aquery request.	Use the`QueryJobConfig` class, which contains properties for the various API configuration options.

Querying data with the GoogleSQL syntax

The following sample shows how to run a GoogleSQL query with and withoutexplicitly specifying a project. For both libraries, if a project is notspecified, the project will be determined from thedefault credentials.

Note: Thepandas.read_gbq method defaults to legacy SQL. To use standardSQL, you must explicitly set thedialect parameter to'standard', asshown.

pandas-gbq:

importpandassql="""    SELECT name    FROM `bigquery-public-data.usa_names.usa_1910_current`    WHERE state = 'TX'    LIMIT 100"""# Run a Standard SQL query using the environment's default projectdf=pandas.read_gbq(sql,dialect="standard")# Run a Standard SQL query with the project set explicitlyproject_id="your-project-id"df=pandas.read_gbq(sql,project_id=project_id,dialect="standard")

google-cloud-bigquery:

fromgoogle.cloudimportbigqueryclient=bigquery.Client()sql="""    SELECT name    FROM `bigquery-public-data.usa_names.usa_1910_current`    WHERE state = 'TX'    LIMIT 100"""# Run a Standard SQL query using the environment's default projectdf=client.query(sql).to_dataframe()# Run a Standard SQL query with the project set explicitlyproject_id="your-project-id"df=client.query(sql,project=project_id).to_dataframe()

Querying data with the legacy SQL syntax

The following sample shows how to run a query using legacy SQL syntax. See theGoogleSQL migration guidefor guidance on updating your queries to GoogleSQL.

pandas-gbq:

importpandassql="""    SELECT name    FROM [bigquery-public-data:usa_names.usa_1910_current]    WHERE state = 'TX'    LIMIT 100"""df=pandas.read_gbq(sql,dialect="legacy")

google-cloud-bigquery:

fromgoogle.cloudimportbigqueryclient=bigquery.Client()sql="""    SELECT name    FROM [bigquery-public-data:usa_names.usa_1910_current]    WHERE state = 'TX'    LIMIT 100"""query_config=bigquery.QueryJobConfig(use_legacy_sql=True)df=client.query(sql,job_config=query_config).to_dataframe()

Using the BigQuery Storage API to download large results

Use theBigQuery Storage API tospeed-updownloads of large results by 15 to 31times.

pandas-gbq:

importpandassql="SELECT * FROM `bigquery-public-data.irs_990.irs_990_2012`"# Use the BigQuery Storage API to download results more quickly.df=pandas.read_gbq(sql,dialect="standard",use_bqstorage_api=True)

google-cloud-bigquery:

fromgoogle.cloudimportbigqueryclient=bigquery.Client()sql="SELECT * FROM `bigquery-public-data.irs_990.irs_990_2012`"# The client library uses the BigQuery Storage API to download results to a# pandas dataframe if the API is enabled on the project, the# `google-cloud-bigquery-storage` package is installed, and the `pyarrow`# package is installed.df=client.query(sql).to_dataframe()

Running a query with a configuration

Sending a configuration with a BigQuery API request is requiredto perform certain complex operations, such as running a parameterized query orspecifying a destination table to store the query results. Inpandas-gbq, theconfiguration must be sent as a dictionary in the format of aquery request.Ingoogle-cloud-bigquery, job configuration classes are provided, such asQueryJobConfig,which contain the necessary properties to configure complex jobs.

The following sample shows how to run a query with named parameters.

pandas-gbq:

importpandassql="""    SELECT name    FROM `bigquery-public-data.usa_names.usa_1910_current`    WHERE state = @state    LIMIT @limit"""query_config={"query":{"parameterMode":"NAMED","queryParameters":[{"name":"state","parameterType":{"type":"STRING"},"parameterValue":{"value":"TX"},},{"name":"limit","parameterType":{"type":"INTEGER"},"parameterValue":{"value":100},},],}}df=pandas.read_gbq(sql,configuration=query_config)

google-cloud-bigquery:

fromgoogle.cloudimportbigqueryclient=bigquery.Client()sql="""    SELECT name    FROM `bigquery-public-data.usa_names.usa_1910_current`    WHERE state = @state    LIMIT @limit"""query_config=bigquery.QueryJobConfig(query_parameters=[bigquery.ScalarQueryParameter("state","STRING","TX"),bigquery.ScalarQueryParameter("limit","INTEGER",100),])df=client.query(sql,job_config=query_config).to_dataframe()

Loading a pandas DataFrame to a BigQuery table

Both libraries support uploading data from a pandas DataFrame to a new table inBigQuery. Key differences include:

	pandas-gbq	google-cloud-bigquery
Type support	Converts the DataFrame to CSV format before sending to the API, which does not support nested or array values.	Converts the DataFrame to Parquet or CSV format before sending to the API, which supports nested and array values. Choose Parquet for struct and array values and CSV for date and time serialization flexibility. Parquet is the default choice. Note that`pyarrow`, which is the parquet engine used to send the DataFrame data to the BigQuery API, must be installed to load the DataFrame to a table.
Load configurations	You can optionally specify atable schema).	Use the`LoadJobConfig` class, which contains properties for the various API configuration options.

pandas-gbq:

importpandasdf=pandas.DataFrame({"my_string":["a","b","c"],"my_int64":[1,2,3],"my_float64":[4.0,5.0,6.0],"my_timestamp":[pandas.Timestamp("1998-09-04T16:03:14"),pandas.Timestamp("2010-09-13T12:03:45"),pandas.Timestamp("2015-10-02T16:00:00"),],})table_id="my_dataset.new_table"df.to_gbq(table_id)

google-cloud-bigquery:

Thegoogle-cloud-bigquery package requires thepyarrow library to serialize a pandas DataFrame to a Parquet file.

Install thepyarrow package:

pipinstallpyarrow

fromgoogle.cloudimportbigqueryimportpandasdf=pandas.DataFrame({"my_string":["a","b","c"],"my_int64":[1,2,3],"my_float64":[4.0,5.0,6.0],"my_timestamp":[pandas.Timestamp("1998-09-04T16:03:14"),pandas.Timestamp("2010-09-13T12:03:45"),pandas.Timestamp("2015-10-02T16:00:00"),],})client=bigquery.Client()table_id="my_dataset.new_table"# Since string columns use the "object" dtype, pass in a (partial) schema# to ensure the correct BigQuery data type.job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("my_string","STRING"),])job=client.load_table_from_dataframe(df,table_id,job_config=job_config)# Wait for the load job to complete.job.result()

Features not supported by pandas-gbq

While thepandas-gbq library provides a useful interface for querying dataand writing data to tables, it does not cover many of theBigQuery API features, including but not limited to:

Managing datasets, includingcreating new datasets,updating dataset properties,anddeleting datasets
Loading data into BigQuery fromformats other than pandas DataFrames or from pandas DataFrames with JSONcolumns
Managing tables, includinglisting tables in a dataset,copying table data,anddeleting tables
Exporting BigQuery datadirectly to Cloud Storage

Troubleshooting connection pool errors

Error string:Connection pool is full, discarding connection: bigquery.googleapis.com.Connection pool size: 10

If you use the default BigQuery client object in Python, you arelimited to a maximum of 10 threads because the default pool size for thePython HTTPAdapteris 10. To use more than 10 connections, create a customrequests.adapters.HTTPAdapterobject. For example:

client=bigquery.Client()adapter=requests.adapters.HTTPAdapter(pool_connections=128,pool_maxsize=128,max_retries=3)client._http.mount("https://",adapter)client._http._auth_request.session.mount("https://",adapter)query_job=client.query(QUERY)

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Use open source a Python libraries

Using pandas-gbq and google-cloud-bigquery

Install the libraries

Running Queries

Querying data with the GoogleSQL syntax

Querying data with the legacy SQL syntax

Using the BigQuery Storage API to download large results

Running a query with a configuration

Loading a pandas DataFrame to a BigQuery table

Features not supported by pandas-gbq

Troubleshooting connection pool errors