Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

expose write options#1006

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
matko wants to merge1 commit intoapache:main
base:main
Choose a base branch
Loading
fromvectorlink-ai:feat/expose-write-options

Conversation

matko
Copy link

@matkomatko commentedJan 29, 2025
edited
Loading

Which issue does this PR close?

Closes#1005 .

This is still a draft. Todo list:

  • DataFrameWriteOptions
  • CsvOptions
  • JsonOptions
  • TableParquetOptions

What changes are included in this PR?

Generic write options

dataframe.write_csv(...),dataframe.write_json(...) anddataframe.write_parquet now take an additional optional argumentwrite_options, corresponding withDataFrameWriteOptions in datafusion. This is a dictionary with the following optional keys:

  • "insert_operation": one of "append", "overwrite" or "replace", corresponding toInsertOp in datafusion.
  • "single_file_option": a boolean
  • "partition_by": list of strings, names of columns to hive partition on
  • "sort_by": list of sort expressions

Csv

Todo..

JSON

Todo..

Parquet

Todo..

Are there any user-facing changes?

The api changes described above. These should be backwards compatible.

#[derive(FromPyObject)]
#[pyo3(from_item_all)]
pub struct PyDataFrameWriteOptions {
insert_operation: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If you wanted, you could use anEnum here instead of aString, and then defer more of the validation to pyo3 instead of manually checking the string values below.

Copy link
Author

@matkomatkoJan 30, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for your comment. I did initially try to do this with an enum. Unfortunately, to use an enum in an object that derivesFromPyObject, the enum itself has to also deriveFromPyObject, but tagging a simple enum (no variants, just tags) with#[derive(FromPyObject)] results inerror: cannot derive FromPyObject for empty structs and variants.

So this errors:

#[derive(FromPyObject)]pubenumPyInsertOperation{Insert,Overwrite,Replace,}

This might be where my pyo3 knowledge is lacking. Can you point me in the right direction on how to do this as an enum properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think I usually separate out the enum but then implement theFromPyObject directly on that enum. It's simpler with better separation of concerns, I think. And if you ever need to usePyInsertOperation from multiple functions, then you can reuse the same implementation

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@kylebarronkylebarronkylebarron left review comments

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Expose all write options
2 participants
@matko@kylebarron

[8]ページ先頭

©2009-2025 Movatter.jp