Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Enhance Arrow to Pandas conversion with type overrides and additional kwargs#579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
madhav-db wants to merge5 commits intomain
base:main
Choose a base branch
Loading
fromissue-578

Conversation

@madhav-db
Copy link
Contributor

@madhav-dbmadhav-db commentedMay 30, 2025
edited
Loading

  • Introduced _arrow_pandas_type_override and _arrow_to_pandas_kwargs in Connection class for customizable dtype mapping and DataFrame construction parameters.
  • Updated ResultSet to utilize these new options during conversion from Arrow tables to Pandas DataFrames.
  • Added unit tests to validate the new functionality, including scenarios for type overrides and additional kwargs handling.

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • Other

Description

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually
  • N/A

Related Tickets & Documents

#578

j-bennet reacted with hooray emoji
… kwargs* Introduced _arrow_pandas_type_override and _arrow_to_pandas_kwargs in Connection class for customizable dtype mapping and DataFrame construction parameters.* Updated ResultSet to utilize these new options during conversion from Arrow tables to Pandas DataFrames.* Added unit tests to validate the new functionality, including scenarios for type overrides and additional kwargs handling.
@github-actions
Copy link

Thanks for your contribution! To satisfy the DCO policy in ourcontributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@madhav-dbmadhav-db changed the titleEnhance Arrow to Pandas conversion with type overrides and additional kwargs [#578]Enhance Arrow to Pandas conversion with type overrides and additional kwargsMay 30, 2025
@github-actions
Copy link

Thanks for your contribution! To satisfy the DCO policy in ourcontributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

Thanks for your contribution! To satisfy the DCO policy in ourcontributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

Thanks for your contribution! To satisfy the DCO policy in ourcontributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@github-actions
Copy link

Thanks for your contribution! To satisfy the DCO policy in ourcontributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@j-bennet
Copy link

@madhav-db Very interested in seeing this pass. 🤞

@vikrantpuppalavikrantpuppala removed their request for reviewJuly 8, 2025 03:36
}

arrow_pandas_type_override=self.connection._arrow_pandas_type_override
ifnotisinstance(arrow_pandas_type_override,dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This if block is not needed, let it fail here itself. Don't want the user to give something incorrect and then everything works. This is a new change and nothing to be backward compatible.
just leave it at this -arrow_pandas_type_override = self.connection._arrow_pandas_type_override

}

arrow_to_pandas_kwargs=self.connection._arrow_to_pandas_kwargs
ifisinstance(arrow_to_pandas_kwargs,dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Same let it fail when the input format is incorrect. The default python interpreter error of type mismatch is enough

self.assertIsInstance(result_default[0].col_ts,datetime.datetime)


if__name__=="__main__":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this is not needed, as we run tests using pytest. Also can you move everything to pytest and remove unittest


@pytest.mark.skipif(paisNone,reason="PyArrow is not installed")
classArrowConversionTests(unittest.TestCase):
@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Move this to use fixtures

conn._arrow_to_pandas_kwargs= {}
returnconn

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Use fixtures or just normal functions, don't need static methods

self.assertEqual(result[0].col_int,1.0)
self.assertEqual(result[0].col_str,"a")

deftest_convert_arrow_table_to_pandas_kwargs(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Too much code duplication. Can you create a parameterized test, where in thismock_connection._arrow_to_pandas_kwargs = {"timestamp_as_object": False} and the assertIsInstance values are parameterized. Otherwise the checking part looks the same and is copied repeatedly

er.is_staging_operation=False
returner

deftest_convert_arrow_table_default(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The test_convert_arrow_table_deafult, test_convert_arrow_table_disable_pandas and test_convert_arrow_table_type_override are essentially the same test flow just with different arguments. Plz use pytest's parameterized tests for such tests where only arguments change

@jprakash-db
Copy link
Contributor

Plz merge the master once, as the last commit is pretty old

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@jprakash-dbjprakash-dbjprakash-db left review comments

@deeksha-dbdeeksha-dbAwaiting requested review from deeksha-db

@samikshya-dbsamikshya-dbAwaiting requested review from samikshya-db

@jackyhu-dbjackyhu-dbAwaiting requested review from jackyhu-db

@gopalldbgopalldbAwaiting requested review from gopalldb

@jayantsing-dbjayantsing-dbAwaiting requested review from jayantsing-db

@shivam2680shivam2680Awaiting requested review from shivam2680

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@madhav-db@j-bennet@jprakash-db

[8]ページ先頭

©2009-2025 Movatter.jp