Use open source a Python libraries
You can choose from among three Python libraries in BigQuery,based on your use case.
| Use case | Maintained by | Description | |
|---|---|---|---|
| BigQuery DataFrames | Python based data processing and ML operations with server-side processing (for example, using slots) | Pandas and Scikit learn APIs implemented with server-side pushdown. For more information, seeIntroduction to BigQuery DataFrames. | |
| pandas-gbq | Python based data processing using client side data copy | Open source library maintained by PyData and volunteer contributors | Lets you move data to and from Python DataFrames on the client side. For more information, see thedocumentation andsource code. |
| google-cloud-bigquery | BigQuery deployment, administration, and SQL-based querying | Open source library maintained by Google | Python package that wraps all the BigQuery APIs. For more information, see thedocumentation andsource code. |
Using pandas-gbq and google-cloud-bigquery
Thepandas-gbq library provides a simple interface for running queries anduploading pandas dataframes to BigQuery. It is a thin wrapperaround theBigQuery client library,google-cloud-bigquery. Both of these libraries focus on helping you performdata analysis using SQL.
Install the libraries
To use the code samples in this guide, install thepandas-gbq package and theBigQuery Python client libraries.
Install thepandas-gbq andgoogle-cloud-bigquery packages.
pipinstall--upgradepandas-gbq'google-cloud-bigquery[bqstorage,pandas]'Running Queries
Both libraries support querying data stored in BigQuery. Keydifferences between the libraries include:
| pandas-gbq | google-cloud-bigquery | |
|---|---|---|
| Default SQL syntax | GoogleSQL (configurable withpandas_gbq.context.dialect) | GoogleSQL |
| Query configurations | Sent as dictionary in the format of aquery request. | Use theQueryJobConfig class, which contains properties for the various API configuration options. |
Querying data with the GoogleSQL syntax
The following sample shows how to run a GoogleSQL query with and withoutexplicitly specifying a project. For both libraries, if a project is notspecified, the project will be determined from thedefault credentials.
Note: Thepandas.read_gbq method defaults to legacy SQL. To use standardSQL, you must explicitly set thedialect parameter to'standard', asshown.pandas-gbq:
importpandassql=""" SELECT name FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE state = 'TX' LIMIT 100"""# Run a Standard SQL query using the environment's default projectdf=pandas.read_gbq(sql,dialect="standard")# Run a Standard SQL query with the project set explicitlyproject_id="your-project-id"df=pandas.read_gbq(sql,project_id=project_id,dialect="standard")google-cloud-bigquery:
fromgoogle.cloudimportbigqueryclient=bigquery.Client()sql=""" SELECT name FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE state = 'TX' LIMIT 100"""# Run a Standard SQL query using the environment's default projectdf=client.query(sql).to_dataframe()# Run a Standard SQL query with the project set explicitlyproject_id="your-project-id"df=client.query(sql,project=project_id).to_dataframe()Querying data with the legacy SQL syntax
The following sample shows how to run a query using legacy SQL syntax. See theGoogleSQL migration guidefor guidance on updating your queries to GoogleSQL.
pandas-gbq:
importpandassql=""" SELECT name FROM [bigquery-public-data:usa_names.usa_1910_current] WHERE state = 'TX' LIMIT 100"""df=pandas.read_gbq(sql,dialect="legacy")google-cloud-bigquery:
Using the BigQuery Storage API to download large results
Use theBigQuery Storage API tospeed-updownloads of large results by 15 to 31times.
pandas-gbq:
importpandassql="SELECT * FROM `bigquery-public-data.irs_990.irs_990_2012`"# Use the BigQuery Storage API to download results more quickly.df=pandas.read_gbq(sql,dialect="standard",use_bqstorage_api=True)google-cloud-bigquery:
fromgoogle.cloudimportbigqueryclient=bigquery.Client()sql="SELECT * FROM `bigquery-public-data.irs_990.irs_990_2012`"# The client library uses the BigQuery Storage API to download results to a# pandas dataframe if the API is enabled on the project, the# `google-cloud-bigquery-storage` package is installed, and the `pyarrow`# package is installed.df=client.query(sql).to_dataframe()Running a query with a configuration
Sending a configuration with a BigQuery API request is requiredto perform certain complex operations, such as running a parameterized query orspecifying a destination table to store the query results. Inpandas-gbq, theconfiguration must be sent as a dictionary in the format of aquery request.Ingoogle-cloud-bigquery, job configuration classes are provided, such asQueryJobConfig,which contain the necessary properties to configure complex jobs.
The following sample shows how to run a query with named parameters.
pandas-gbq:
importpandassql=""" SELECT name FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE state = @state LIMIT @limit"""query_config={"query":{"parameterMode":"NAMED","queryParameters":[{"name":"state","parameterType":{"type":"STRING"},"parameterValue":{"value":"TX"},},{"name":"limit","parameterType":{"type":"INTEGER"},"parameterValue":{"value":100},},],}}df=pandas.read_gbq(sql,configuration=query_config)google-cloud-bigquery:
fromgoogle.cloudimportbigqueryclient=bigquery.Client()sql=""" SELECT name FROM `bigquery-public-data.usa_names.usa_1910_current` WHERE state = @state LIMIT @limit"""query_config=bigquery.QueryJobConfig(query_parameters=[bigquery.ScalarQueryParameter("state","STRING","TX"),bigquery.ScalarQueryParameter("limit","INTEGER",100),])df=client.query(sql,job_config=query_config).to_dataframe()Loading a pandas DataFrame to a BigQuery table
Both libraries support uploading data from a pandas DataFrame to a new table inBigQuery. Key differences include:
| pandas-gbq | google-cloud-bigquery | |
|---|---|---|
| Type support | Converts the DataFrame to CSV format before sending to the API, which does not support nested or array values. | Converts the DataFrame to Parquet or CSV format before sending to the API, which supports nested and array values. Choose Parquet for struct and array values and CSV for date and time serialization flexibility. Parquet is the default choice. Note thatpyarrow, which is the parquet engine used to send the DataFrame data to the BigQuery API, must be installed to load the DataFrame to a table. |
| Load configurations | You can optionally specify atable schema). | Use theLoadJobConfig class, which contains properties for the various API configuration options. |
pandas-gbq:
importpandasdf=pandas.DataFrame({"my_string":["a","b","c"],"my_int64":[1,2,3],"my_float64":[4.0,5.0,6.0],"my_timestamp":[pandas.Timestamp("1998-09-04T16:03:14"),pandas.Timestamp("2010-09-13T12:03:45"),pandas.Timestamp("2015-10-02T16:00:00"),],})table_id="my_dataset.new_table"df.to_gbq(table_id)google-cloud-bigquery:
google-cloud-bigquery package requires thepyarrow library to serialize a pandas DataFrame to a Parquet file.Install thepyarrow package:
pipinstallpyarrowfromgoogle.cloudimportbigqueryimportpandasdf=pandas.DataFrame({"my_string":["a","b","c"],"my_int64":[1,2,3],"my_float64":[4.0,5.0,6.0],"my_timestamp":[pandas.Timestamp("1998-09-04T16:03:14"),pandas.Timestamp("2010-09-13T12:03:45"),pandas.Timestamp("2015-10-02T16:00:00"),],})client=bigquery.Client()table_id="my_dataset.new_table"# Since string columns use the "object" dtype, pass in a (partial) schema# to ensure the correct BigQuery data type.job_config=bigquery.LoadJobConfig(schema=[bigquery.SchemaField("my_string","STRING"),])job=client.load_table_from_dataframe(df,table_id,job_config=job_config)# Wait for the load job to complete.job.result()Features not supported by pandas-gbq
While thepandas-gbq library provides a useful interface for querying dataand writing data to tables, it does not cover many of theBigQuery API features, including but not limited to:
- Managing datasets, includingcreating new datasets,updating dataset properties,anddeleting datasets
- Loading data into BigQuery fromformats other than pandas DataFrames or from pandas DataFrames with JSONcolumns
- Managing tables, includinglisting tables in a dataset,copying table data,anddeleting tables
- Exporting BigQuery datadirectly to Cloud Storage
Troubleshooting connection pool errors
Error string:Connection pool is full, discarding connection: bigquery.googleapis.com.Connection pool size: 10
If you use the default BigQuery client object in Python, you arelimited to a maximum of 10 threads because the default pool size for thePython HTTPAdapteris 10. To use more than 10 connections, create a customrequests.adapters.HTTPAdapterobject. For example:
client=bigquery.Client()adapter=requests.adapters.HTTPAdapter(pool_connections=128,pool_maxsize=128,max_retries=3)client._http.mount("https://",adapter)client._http._auth_request.session.mount("https://",adapter)query_job=client.query(QUERY)
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.