- Notifications
You must be signed in to change notification settings - Fork118
expose write options#1006
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
expose write options#1006
Uh oh!
There was an error while loading.Please reload this page.
Conversation
#[derive(FromPyObject)] | ||
#[pyo3(from_item_all)] | ||
pub struct PyDataFrameWriteOptions { | ||
insert_operation: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
If you wanted, you could use anEnum
here instead of aString
, and then defer more of the validation to pyo3 instead of manually checking the string values below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for your comment. I did initially try to do this with an enum. Unfortunately, to use an enum in an object that derivesFromPyObject
, the enum itself has to also deriveFromPyObject
, but tagging a simple enum (no variants, just tags) with#[derive(FromPyObject)]
results inerror: cannot derive FromPyObject for empty structs and variants
.
So this errors:
#[derive(FromPyObject)]pubenumPyInsertOperation{Insert,Overwrite,Replace,}
This might be where my pyo3 knowledge is lacking. Can you point me in the right direction on how to do this as an enum properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think I usually separate out the enum but then implement theFromPyObject
directly on that enum. It's simpler with better separation of concerns, I think. And if you ever need to usePyInsertOperation
from multiple functions, then you can reuse the same implementation
Uh oh!
There was an error while loading.Please reload this page.
Which issue does this PR close?
Closes#1005 .
This is still a draft. Todo list:
DataFrameWriteOptions
What changes are included in this PR?
Generic write options
dataframe.write_csv(...)
,dataframe.write_json(...)
anddataframe.write_parquet
now take an additional optional argumentwrite_options
, corresponding withDataFrameWriteOptions
in datafusion. This is a dictionary with the following optional keys:InsertOp
in datafusion.Csv
Todo..
JSON
Todo..
Parquet
Todo..
Are there any user-facing changes?
The api changes described above. These should be backwards compatible.