Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitf7d3865

Browse files
jayantsing-dbvarun-edachali-dbx
authored andcommitted
Refactor decimal conversion in PyArrow tables to use direct casting (#544)
This PR replaces the previous implementation of convert_decimals_in_arrow_table() with a more efficient approach that uses PyArrow's native casting operation instead of going through pandas conversion and array creation.- Remove conversion to pandas DataFrame via to_pandas() and apply() methods- Remove intermediate steps of creating array from decimal column and setting it back- Replace with direct type casting using PyArrow's cast() method- Build a new table with transformed columns rather than modifying the original table- Create a new schema based on the modified fieldsThe new approach is more performant by avoiding pandas conversion overhead. The table below highlights substantial performance improvements when retrieving all rows from a table containing decimal columns, particularly when compression is disabled. Even greater gains were observed with compression enabled—showing approximately an 84% improvement (6 seconds compared to 39 seconds). Benchmarking was performed against e2-dogfood, with the client located in the us-west-2 region.![image](https://github.com/user-attachments/assets/5407b651-8ab6-4c13-b525-cf912f503ba0)Signed-off-by: Jayant Singh <jayant.singh@databricks.com>Signed-off-by: varun-edachali-dbx <varun.edachali@databricks.com>
1 parent8f7754b commitf7d3865

File tree

1 file changed

+19
-9
lines changed

1 file changed

+19
-9
lines changed

‎src/databricks/sql/utils.py‎

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -611,21 +611,31 @@ def convert_arrow_based_set_to_arrow_table(arrow_batches, lz4_compressed, schema
611611

612612

613613
defconvert_decimals_in_arrow_table(table,description)->"pyarrow.Table":
614+
new_columns= []
615+
new_fields= []
616+
614617
fori,colinenumerate(table.itercolumns()):
618+
field=table.field(i)
619+
615620
ifdescription[i][1]=="decimal":
616-
decimal_col=col.to_pandas().apply(
617-
lambdav:vifvisNoneelseDecimal(v)
618-
)
619621
precision,scale=description[i][4],description[i][5]
620622
assertscaleisnotNone
621623
assertprecisionisnotNone
622-
# Spark limits decimal to a maximum scale of 38,
623-
# so 128 is guaranteed to be big enough
624+
# create the target decimal type
624625
dtype=pyarrow.decimal128(precision,scale)
625-
col_data=pyarrow.array(decimal_col,type=dtype)
626-
field=table.field(i).with_type(dtype)
627-
table=table.set_column(i,field,col_data)
628-
returntable
626+
627+
new_col=col.cast(dtype)
628+
new_field=field.with_type(dtype)
629+
630+
new_columns.append(new_col)
631+
new_fields.append(new_field)
632+
else:
633+
new_columns.append(col)
634+
new_fields.append(field)
635+
636+
new_schema=pyarrow.schema(new_fields)
637+
638+
returnpyarrow.Table.from_arrays(new_columns,schema=new_schema)
629639

630640

631641
defconvert_to_assigned_datatypes_in_column_table(column_table,description):

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp