Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Dispatched sql from dry run does not match sql from Dataframe attribute #2191

Open
Assignees
jiaxunwu
Labels
api: bigqueryIssues related to the googleapis/python-bigquery-dataframes API.
@ClaudioSalvatoreArcidiacono

Description

I have noticed a wild discrepancy between the sql obtained fromDataFrame.sql and the dispatched sql fromDataFrame.to_pandas(dry_run=True). The sql code I get fromDataFrame.sql is much cleaner and, accordingly to the big query UI, it would consume much less bytes when executed (in my example 156.35 MB). The dispatched sql obtained viaDataFrame.to_pandas(dry_run=True) is much heavier and less optimised, requiring a full table scan (~2TB in my example). As far as I understood, the sql fromDataFrame.sql does not rely on any cached table.

It would be nice if the dispatched sql would be the optimised one in order to avoid a full table scan on partitioned tables.

Environment details

  • OS type and version: Debian GNU/Linux 11 (bullseye)
  • Python version: 3.10.16
  • pip version:pip 25.0.1
  • bigframes version:2.26.0
Python:3.10.16|packagedbyconda-forge| (main,Dec52024,14:16:10) [GCC13.3.0]bigframes==2.26.0google-cloud-bigquery==3.38.0pandas==2.3.3pyarrow==21.0.0sqlglot==27.28.1

Code example

importdatetimeimportbigframes.pandasasbpdbpd.options.bigquery.project="<redacted>"df=bpd.read_gbq("bigquery-public-data.crypto_bitcoin.transactions")# This table is partitioned by block_timestamp_monthselection=df[df.block_timestamp_month==datetime.date(2025,10,1)][["block_timestamp_month","size"]]dry_run=selection.to_pandas(dry_run=True)
dry_run[dry_run.index=="totalBytesProcessed"].values[0]#Output: 2365056405079# ~2.3TB
selection.sql

Outputs

SELECT`bfuid_col_12`AS`block_timestamp_month`,`bfuid_col_5`AS`size`FROM(SELECT`t0`.`size`,`t0`.`block_timestamp_month`,`t0`.`size`AS`bfuid_col_5`,`t0`.`block_timestamp_month`AS`bfuid_col_12`,`t0`.`block_timestamp_month`=DATE(2025,10,1)AS`bfuid_col_21`FROM (SELECT`size`,`block_timestamp_month`FROM`bigquery-public-data.crypto_bitcoin.transactions` FOR SYSTEM_TIMEAS OFTIMESTAMP('2025-10-23T12:47:48.073960+00:00'))AS`t0`WHERE`t0`.`block_timestamp_month`=DATE(2025,10,1))

Which gives an estimated processed bytes of 156.35 MB when pasted on the big query editor.

Checking the dispatched sql I get:

print(dry_run[dry_run.index=="dispatchedSql"].values[0])
SELECT`bfuid_col_3`AS`bfuid_col_3`,`bfuid_col_12`AS`bfuid_col_12`,`bfuid_col_5`AS`bfuid_col_5`FROM(SELECT`t1`.`bfuid_col_3`,`t1`.`bfuid_col_12`,`t1`.`bfuid_col_5`,`t1`.`bfuid_col_28`AS`bfuid_col_29`FROM (SELECT    ROW_NUMBER() OVER (ORDER BY CONCAT(        CAST(FARM_FINGERPRINT(          CONCAT(            CONCAT('\\', REPLACE(COALESCE(`t0`.`hash`,''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`size`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`version`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`lock_time`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(`t0`.`block_hash`,''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_number`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp`AS STRING),''),'\\','\\\\')),            CONCAT('\\',              REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month`AS STRING),''),'\\','\\\\')            ),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_count`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_count`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_value`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_value`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`fee`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`),''),'\\','\\\\'))          )        )AS STRING),        CAST(FARM_FINGERPRINT(          CONCAT(            CONCAT(              CONCAT('\\', REPLACE(COALESCE(`t0`.`hash`,''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`size`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`version`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`lock_time`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(`t0`.`block_hash`,''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_number`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp`AS STRING),''),'\\','\\\\')),              CONCAT('\\',                REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month`AS STRING),''),'\\','\\\\')              ),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_count`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_count`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_value`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_value`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`fee`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`),''),'\\','\\\\'))            ),'_'          )        )AS STRING),        CAST(RAND()AS STRING)      )ASC    )-1AS`bfuid_col_3`,`t0`.`size`AS`bfuid_col_5`,`t0`.`block_timestamp_month`AS`bfuid_col_12`,`t0`.`block_timestamp_month`=DATE(2025,10,1)AS`bfuid_col_21`,    ROW_NUMBER() OVER (ORDER BY CONCAT(        CAST(FARM_FINGERPRINT(          CONCAT(            CONCAT('\\', REPLACE(COALESCE(`t0`.`hash`,''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`size`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`version`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`lock_time`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(`t0`.`block_hash`,''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_number`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp`AS STRING),''),'\\','\\\\')),            CONCAT('\\',              REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month`AS STRING),''),'\\','\\\\')            ),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_count`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_count`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_value`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_value`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`fee`AS STRING),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`),''),'\\','\\\\')),            CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`),''),'\\','\\\\'))          )        )AS STRING),        CAST(FARM_FINGERPRINT(          CONCAT(            CONCAT(              CONCAT('\\', REPLACE(COALESCE(`t0`.`hash`,''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`size`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`virtual_size`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`version`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`lock_time`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(`t0`.`block_hash`,''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_number`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`block_timestamp`AS STRING),''),'\\','\\\\')),              CONCAT('\\',                REPLACE(COALESCE(CAST(`t0`.`block_timestamp_month`AS STRING),''),'\\','\\\\')              ),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_count`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_count`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`input_value`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`output_value`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`is_coinbase`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(CAST(`t0`.`fee`AS STRING),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`inputs`),''),'\\','\\\\')),              CONCAT('\\', REPLACE(COALESCE(to_json_string(`t0`.`outputs`),''),'\\','\\\\'))            ),'_'          )        )AS STRING),        CAST(RAND()AS STRING)      )ASC    )-1AS`bfuid_col_28`FROM (SELECT`hash`,`size`,`virtual_size`,`version`,`lock_time`,`block_hash`,`block_number`,`block_timestamp`,`block_timestamp_month`,`input_count`,`output_count`,`input_value`,`output_value`,`is_coinbase`,`fee`,`inputs`,`outputs`FROM`bigquery-public-data.crypto_bitcoin.transactions` FOR SYSTEM_TIMEAS OFTIMESTAMP('2025-10-23T12:47:48.073960+00:00')  )AS`t0`)AS`t1`WHERE`t1`.`bfuid_col_21`)ORDER BY`bfuid_col_29`ASC NULLS LAST

Metadata

Metadata

Assignees

Labels

api: bigqueryIssues related to the googleapis/python-bigquery-dataframes API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions


    [8]ページ先頭

    ©2009-2025 Movatter.jp